•  
  •  
 

Abstract

Character segmentation has long been a critical area of the OCR process. The higher recognition rates for isolated characters vs. those obtained for words and connected character strings well illustrate this fact. A good part of recent progress in reading unconstrained printed and written text may be ascribed to more insightful handling of segmentation. To take care of variability involved in the writing style of different individuals in this paper we propose a robust scheme to segment unconstrained handwritten Bangla texts into lines, words and characters. For line segmentation, at first, we divide the text into vertical stripes. Stripe width of a document is computed by statistical analysis of the text height in the document. Next we determine the horizontal histogram of these stripes and the relationship of the minimal values of the histograms is used to segment text lines. Based on vertical projection profile lines are segmented into words. Segmentation of characters from handwritten words is very tricky as the characters are seldom vertically separable. Segmentation of cursive handwriting is the challenging step of Optical Character Recognition (OCR).The recognition accuracy will highly depend on the good segmentation. Segmentation of cursive handwriting is very difficult. The segmentation can be done on the basis of zoning, a line segment of text, a word segment from line and character segment from word. This can be done by the use of horizontal, vertical methods. This paper reviews many basic and advanced techniques of handwritten word segmentation.

Share

COinS