I have a script that parses the filenames of TV episodes (show.name.s01e02.avi for example), grabs the episode name (from the www.thetvdb.com API) and automatically renames
In Mastering Regular Expressions from Jeffrey Friedl (great book) it is mentioned that you could use \p{Letter} which will match unicode stuff that is considered a letter.