I am fetching data from a web page using urllib2. The content of all the pages is in the English language so there is no issue of dealing with non-English text. The pages ar
Or with Requests:
page_text = requests.get(url).text
lowercase_text = page_text.lower()
(Requests will automatically decode the response.)
As @tchrist says, .lower()
will not do the job for unicode text.
You could check out this alternative regex implementation which implements case folding for unicode case insensitive comparison: http://code.google.com/p/mrab-regex-hg/
There are also casefolding tables available: http://unicode.org/Public/UNIDATA/CaseFolding.txt