Scraping a website whose encoding is iso-8859-1 instead of utf-8: how do I store the correct unicode in my database?
问题 I'd like to scrape a website using Python that is full of horrible problems, one being the wrong encoding at the top: <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> This is wrong because the page is full of occurrences like the following: Nell’ambito instead of Nell'ambito (please notice ’ replaces ' ) If I understand correctly, this is happening because utf-8 bytes (probably the database encoding) are interpreted as iso-8859-1 bytes (forced by the charset in the