I\'ve created an iPhone app that can scan an image of a page of graph paper and can then tell me which squares have been blacked out and which squares are blank.
I do th
I've been starting to do something similar using my GPUImage iOS framework, so that might be an alternative to doing all of this in OpenCV or something else. As it's name indicates, GPUImage is entirely GPU-based, so it can have some tremendous performance benefits over CPU-bound processing (up to 180X faster for doing things like processing live video).
As a first stage, I took your images and ran them through a simple luminance thresholding filter with a threshold of 0.5 and arrived at the following for your two images:

I just added an adaptive thresholding filter, which attempts to correct for local illumination variances, and works really well for picking out text. However, in your images it uses too small of an averaging radius to handle your blobs well:

and seems to bring out your grid lines, which it sounds like you wish to ignore.
Maurits provides a more comprehensive description of what you could do, but there might be a way to implement these processing operations as high-performance GPU-based filters instead of relying on slower OpenCV versions of the same calculations. If you could grab rotation and scaling information from this thresholded image, you could construct a transform that could also be applied as a filter to your thresholded image to produce your final aligned image, which could then be downsampled and read out by your application to determine which grid locations were filled in.
These GPU-based thresholding operations run in less than 2 ms for 640x480 frames on an iPhone 4, so it might be possible to chain filters together to analyze incoming video frames as fast as the device's video camera can provide them.