I\'m trying to download some content from a dictionary site like http://dictionary.reference.com/browse/apple?s=t
The problem I\'m having is that the original paragr
The given URL returns UTF-8 as the HTTP response clearly indicates:
wget -S http://dictionary.reference.com/browse/apple?s=t
--2013-01-02 08:43:40-- http://dictionary.reference.com/browse/apple?s=t
Resolving dictionary.reference.com (dictionary.reference.com)... 23.14.94.26, 23.14.94.11
Connecting to dictionary.reference.com (dictionary.reference.com)|23.14.94.26|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Server: Apache
Cache-Control: private
Content-Type: text/html;charset=UTF-8
Date: Wed, 02 Jan 2013 07:43:40 GMT
Transfer-Encoding: chunked
Connection: keep-alive
Connection: Transfer-Encoding
Set-Cookie: sid=UOPlLC7t-zl20-k7; Domain=reference.com; Expires=Wed, 02-Jan-2013 08:13:40 GMT; Path=/
Set-Cookie: cu.wz=0; Domain=.reference.com; Expires=Thu, 02-Jan-2014 07:43:40 GMT; Path=/
Set-Cookie: recsrch=apple; Domain=reference.com; Expires=Tue, 02-Apr-2013 07:43:40 GMT; Path=/
Set-Cookie: dcc=*~*~*~*~*~*~*~*~; Domain=reference.com; Expires=Thu, 02-Jan-2014 07:43:40 GMT; Path=/
Set-Cookie: iv_dic=1-0; Domain=reference.com; Expires=Thu, 03-Jan-2013 07:43:40 GMT; Path=/
Set-Cookie: accepting=1; Domain=.reference.com; Expires=Thu, 02-Jan-2014 07:43:40 GMT; Path=/
Set-Cookie: bid=UOPlLC7t-zlrHXne; Domain=reference.com; Expires=Fri, 02-Jan-2015 07:43:40 GMT; Path=/
Length: unspecified [text/html]
Investigating the saved file using vim also reveals that the data is correctly utf-8 encoded...the same is true fetching the URL using Python.