I always work on Arabic text files and to avoid problems with encoding I transliterate Arabic characters into English according to Buckwalter\'s scheme (http://www.qamus.org/tra
Incidentally, someone already wrote a script that does this, so you might want to check that out before spending too much time on your own: buckwalter2unicode.py
It probably does more than what you need, but you don't have to use all of it: I copied just the two dictionaries and the transliterateString function (with a few tweaks, I think), and use that on my site.
Edit: The script above is what I have been using, but I'm just discovered that it is much slower than using replace, especially for a large corpus. This is the code I finally ended up with, that seems to be simpler and faster (this references a dictionary buck2uni):
def transString(string, reverse=0):
'''Given a Unicode string, transliterate into Buckwalter. To go from
Buckwalter back to Unicode, set reverse=1'''
for k, v in buck2uni.items():
if not reverse:
string = string.replace(v, k)
else:
string = string.replace(k, v)
return string