Python 3.3 adds the casefold
method to the str type, but in 2.x I don\'t have anything. What\'s the best way to work around this?
If PyICU is already installed; you could use it to define casefold()
. Using the same example strings as in @Russ' answer:
>>> import icu
>>> casefold = lambda u: unicode(icu.UnicodeString(u).foldCase())
>>> print casefold(u"tschüß")
tschüss
>>> casefold(u"ΣίσυφοςfiÆ") == casefold(u"ΣΊΣΥΦΟσFIæ") == u"σίσυφοσfiæ"
True
>>> icu.UNICODE_VERSION
'6.3'
>>> import unicodedata
>>> unicodedata.unidata_version
'5.2.0'
The result may depend on the version of Unicode standard.
Check out py2casefold.
>>> from py2casefold import casefold
>>> print casefold(u"tschüß")
tschüss
>>> casefold(u"ΣίσυφοςfiÆ") == casefold(u"ΣΊΣΥΦΟσFIæ") == u"σίσυφοσfiæ"
True
There is a thread here which covers some of the issues (but may not resolve all), you can judge whether it is suitable for what you need. If this is no good then there are some useful tips for implementing case folding on the W3C site here.