n South African Computer Journal - A texture-based method for document segmentation and classification
|Article Title||A texture-based method for document segmentation and classification|
|© Publisher:||South African Computer Society (SAICSIT)|
|Journal||South African Computer Journal|
|Author||M-W. Lin, J-R. Tapamo and B. Ndovie|
|Publication Date||Jun 2006|
|Pages||49 - 56|
|Keyword(s)||Document Image Analysis, Feature extraction, Grey Level Co-occurrence Matrix (GLCM), Information retrieval, K-Means Clustering and Texture segmentation|
In this paper we present a hybrid approach to segment and classify contents of document images. A Document Image is segmented into three types of regions: Graphics, Text and Space. The image of a document is subdivided into blocks and for each block five GLCM (Grey Level Co-occurrence Matrix) features are extracted. Based on these features, blocks are then clustered into three groups using K-Means algorithm; connected blocks that belong to the same group are merged. The classification of groups is done using pre-learned heuristic rules. Experiments were conducted on scanned newspapers and images from Media Team Document Database.
Article metrics loading...