UM E-Theses Collection (澳門大學電子學位論文庫)

check Full Text

Hybrid segmentation on slant & skewed deformation text in natural scene images

English Abstract

The works presented in this dissertation originate from our years of research on vision-to-text machine translation, in which the prototyping system applied on camera-based images has been developed by our group. However, it is found in our practice that text detection in complex background scene image is far more challenging than expected. The biggest challenge comes from the fact that text line varies in size, grey, shape and color. Furthermore, traditional learning-based classification method on image block is unsuitable for text detection that embeds in complex background scene images. Consequently, the conventional grey-value based OCR (Optical Character Recognition) software has encountered insufficiency on recognizing the text located with colorful background in scene images. While in this thesis, a general framework is proposed for text detection based on a novel hybrid text line detection paradigm. The text detection process can be divided into two stages. Firstly, image pixels are grouped into regions through Vector Quantilization method so that candidate text lines are obtained via layout analysis of regions with same color label. Secondly, wavelet histogram features of the candidate text lines are extracted to discriminate text pattern and non-text pattern by employing an SVM (Support Vector Machine) classifier. Here the classified text pattern is testified using the OCR technique on mentioned text to see whether it has textural feature or not. Then in the refinement procedure, a homography operation is utilized to project the distort text from image plane to the real world plane. Domain knowledge is used to find corresponding points between the text in image plane and in real world plane. The performance of proposed text detection technique is evaluated by using an international benchmark database ICDAR 2003. As to the testing of text detection applied on scene images in the aforementioned database, the proposed method achieved the precision of 0.60 and recall of 0.63, which is superior to those published methods. Coming to the testing of time consuming, our method can not reach the best performance but still acceptable, which was affected by the two-stage hybrid detection strategy. In summary, the proposed method is robust against complex background in scene images so it will make great contribution to the vision-to-text machine translation research and application.

Issue date



Fei, Xiao Lei


Faculty of Science and Technology




Image processing -- Digital techniques

Optical character recognition

Optical pattern recognition

Software Engineering -- Department of Computer and Information Science


Dong, Ming Chui

Files In This Item

Full-text (Intranet)

Full-text (Intranet)

1/F Zone C
Library URL