Since you are using Python, you might try UnicodeDammit. It is part of Beautiful Soup that you also may find useful.
Like the name suggests, UnicodeDammit will try to do whatever it takes to get proper unicode out of the crap you may find in the world.