Tuesday, July 22, 2008

Preprocessing Handwritten Text

For this activity, we would try to extract handwriiten text from scanned tables. From the whole scanned document I cropped the portion shown below:
As can be seen, the text is barely discernible from the image but I'll try my best to extract it :D.
The first thing I did was to get the Fourier transform of the image so that I can make an appropriate filter. Since I wanted to remove the table, I created a filter which would block all the horizontal and vertical lines (shown below).
The image after filtering:
As can be seen, some of the lines still remain but are now white so if I am to binarize the image, most of this would not affect the image.
Since have no program that can estimate the best threshold, I estimated it using trial and error. After thresholding:

Most of the table lines are now gone but some lines still remain in the left side of the image :( And also, most of the text are now unreadable (they were quite unreadable to begin with) and the text in the fourth column are now gone since they were written lightly.
To further enhance the image, I used the opening and closing operators with a 2x2 structuring element. Also to make the width of the letters 1 pixel, I eroded the image with a 2x1 structuring element.
The last thing to do was to label the letters.
The total labeled blobs was 256 which was very high compared to the number of letters appearing in the text. This was expected since as we can see in the image, there are lots of pixels which do not correspond to any text or a single letter is broken up to many pieces.

//Scilab code
image1 = imread("text-fft-filter3.bmp") - 1;
image2 = imread("text.bmp") - 1;
se1 = imread("se1.bmp"); //a 2x1 structuring element
se2 = imread("se3.bmp"); // a 2x2 structuring element

fftimage1 = fftshift(image1);
fftimage2 = fft2(image2);

image1_2 = fftimage1.*fftimage2;
infft = fft2(image1_2);
fimage = real(infft);
fimage = (fimage/max(fimage)) * 255;
[x, y] = histogram(fimage);
scf(1);
imshow(fimage, []);
//bar(x,y)
threshold = (find(y == max(y), 1) - 30)/255;
image = im2bw(fimage, threshold);
scf(2);
imshow(image, [])
image = abs(image - 1);

//Opening
image = dilate(erode(image, se2, [1,1]), se2, [1,1]);

//Closing
image = erode(dilate(image, se2, [1,1]), se2, [1,1]);

image = erode(image, se1, [1,1]);

scf(3);
imshow(image, []);

labeled = bwlabel(image, 4);
scf(4);
imshow(labeled, []);

//end of code

For this activity, I give myself a grade of 9 since I can't find the proper threshold for the image.
Collaborators: None

1 comment:

Jing said...

Hello Ed,

Indeed it is hard to find a fixed global threshold. What is usually done is to select an optimum threshold per sub-block of the image. So then it becomes a local thresholding.