encoding

How to find out the correct encoding when using beautifulsoup?

懵懂的女人 提交于 2021-01-28 20:15:48
问题 In python3 and beautifulsoup4 I want to get information from a website, after making the requests. I did so: import requests from bs4 import BeautifulSoup req = requests.get('https://sisgvarmazenamento.blob.core.windows.net/prd/PublicacaoPortal/Arquivos/201901.htm').text soup = BeautifulSoup(req,'lxml') soup.find("h1").text '\r\n CÃ\x82MARA MUNICIPAL DE SÃ\x83O PAULO' I do not know what the encoding is, but it's a site with Brazilian Portuguese, so it should be utf-8 or latin1 Please, is

R greater than or equal to character converts to equal sign

别来无恙 提交于 2021-01-28 20:11:29
问题 I have imported an Excel (xlsx) file using the readxl package in R . One of the columns, labeldata , that is imported into R contains labels for other data contained in the file, so it consists of character data such as ≥65 years old . When I print this labeldata to the console the value for "greater than or equal to 65 years old" is properly displayed as ≥65 years old . However, when I try to combine this column with other columns (using cbind or other methods), the greater than or equal to

Using Swift, how do you re-encode then decode a String like this short script in Python?

穿精又带淫゛_ 提交于 2021-01-28 19:36:34
问题 XKCD has some issues with their API and weird encoding issues. Minor encoding issue with xkcd alt texts in chat The solution (in Python) is to encode it as latin1 then decode as utf8, but how do I do this in Swift? Test string: "Be careful\u00e2\u0080\u0094it's breeding season" Expected output: Be careful—it's breeding season Python (from above link): import json a = '''"Be careful\u00e2\u0080\u0094it's breeding season"''' print(json.loads(a).encode('latin1').decode('utf8')) How is this done

How to fix 'UnicodeDecodeError: 'utf-8' codec can't decode byte' when using Python C Extensions?

a 夏天 提交于 2021-01-28 11:51:39
问题 Given the following file bug.txt : event "øat" not handled I wrote the following Python C Extensions on the file fastfilewrapper.cpp #include <Python.h> #include <cstdio> #include <iostream> #include <sstream> #include <fstream> static PyObject* hello_world(PyObject *self, PyObject *args) { printf("Hello, world!\n"); std::string retval; std::ifstream fileifstream; fileifstream.open("./bug.txt"); std::getline( fileifstream, retval ); fileifstream.close(); std::cout << "retval " << retval <<

incorrect encoding serving binary (pdf) file through ClassPathResource in Spring

旧城冷巷雨未停 提交于 2021-01-28 11:23:06
问题 I have been struggling with the following problem for two days and can not get my head around it. I am trying to serve a static pdf in a Spring Boot rest application. It should be very straight forward but I just cannot get it to work. First I simply placed the pdf in the resource folder and tried to load it directly from the javascript code, like this: var newWindow = window.open(/pdf/test.pdf, ''); That resulted in a new window with a pdf not showing any content. Saving the pdf to disk from

how are smileys encoded in mysql utf-8 mb4 database?

廉价感情. 提交于 2021-01-28 11:01:03
问题 I changed my mysql database to utf-8 mb4, so that users could enter smileys from their mobile phones / mac. It works (users can enter smileys and those smileys are shown in the web app on supported devices), but whenever I have a look at the table contents (via terminal or mysql workbench) it shows each smiley as a single question mark. How exactly are those smileys saved? I assume they got a utf-8 mb4 code, but is there any way to look at those? Thanks! 回答1: I suspect Workbench is running in

lxml encoding error when parsing utf8 xml

假装没事ソ 提交于 2021-01-28 09:19:23
问题 I'm trying to iterate through an XML file (UTF-8 encoded, starts with ) with lxml, but get the following error on the character 丂 : UnicodeEncodeError: 'cp932' codec can't encode character u'\u4e02' in position 0: illegal multibyte sequence Other characters before this are printed out correctly. The code is: parser = etree.XMLParser(encoding='utf-8') tree = etree.parse("filename.xml", parser) root = tree.getroot() for elem in root: print elem[0].text Does the error mean that it didn't parse

Encoding a file to base 64 Nodejs

寵の児 提交于 2021-01-28 08:30:37
问题 I used the code below to encode a file to base64. var bitmap = fs.readFileSync(file); return new Buffer(bitmap).toString('base64'); I figured that in the file we have issues with “” and ‘’ characters, but it’s fine with " When we have It’s , node encodes the characters, but when I decode, I see it as It’s Here's the javascript I'm using to decode: fs.writeFile(reportPath, body.buffer, {encoding: 'base64'} So, once the file is encoded and decoded, it becomes unusable with these funky

Pandas column of lists to separate columns

安稳与你 提交于 2021-01-28 07:03:26
问题 Problem Incoming data is a list of 0+ categories: #input data frame df = pd.DataFrame({'categories':(list('ABC'), list('BC'), list('A'))}) categories 0 [A, B, C] 1 [B, C] 2 [A] I would like to convert this to a DataFrame with one column per category and a 0/1 in each cell: #desired output A B C 0 1 1 1 1 0 1 1 2 1 0 0 Attempt OneHotEncoder with LabelEncoder get stuck because they don't handle lists in cells. The desired result is currently achieved with nested for loops: #get unique

XmlDocument with Kanji text content is not encoded correctly to ISO-8859-1 using XmlTextWriter

别来无恙 提交于 2021-01-28 05:35:53
问题 I have an XmlDocument that includes Kanji in its text content, and I need to write it to a stream using ISO-8859-1 encoding. When I do, none of the Kanji characters are encoded properly, and are instead replaced with "??". Here is sample code that demonstrates how the XML is written from the XmlDocument : MemoryStream mStream = new MemoryStream(); Encoding enc = Encoding.GetEncoding("ISO-8859-1"); XmlTextWriter writer = new XmlTextWriter(mStream,enc); doc.WriteTo(writer); writer.Flush();