Returning a lower case ASCII string from a (possibly encoded) string fetched using urllib2 or BeautifulSoup

后端 未结 3 603
甜味超标
甜味超标 2020-12-06 23:24

I am fetching data from a web page using urllib2. The content of all the pages is in the English language so there is no issue of dealing with non-English text. The pages ar

3条回答
  •  温柔的废话
    2020-12-06 23:46

    Or with Requests:

    page_text = requests.get(url).text
    lowercase_text = page_text.lower()
    

    (Requests will automatically decode the response.)

    As @tchrist says, .lower() will not do the job for unicode text.

    You could check out this alternative regex implementation which implements case folding for unicode case insensitive comparison: http://code.google.com/p/mrab-regex-hg/

    There are also casefolding tables available: http://unicode.org/Public/UNIDATA/CaseFolding.txt

提交回复
热议问题