encoding

DOM in PHP: Decoded entities and setting nodeValue

浪尽此生 提交于 2019-12-23 17:08:28
问题 I want to perform certain manipulations on a XML document with PHP using the DOM part of its standard library. As others have already discovered, one has to deal with decoded entities then. To illustrate what bothers me, I give a quick example. Suppose we have the following code $doc = new DOMDocument(); $doc->loadXML(<XML data>); $xpath = new DOMXPath($doc); $node_list = $xpath->query(<some XPath>); foreach($node_list as $node) { //do something } If the code in the loop is something like

UTF-8, PHP and XML Mysql

自作多情 提交于 2019-12-23 17:06:02
问题 I am having great problems solving this one: I have a mysql database encoding latin1_swedish_ci and a table that stores names and addresses. I am trying to output a UTF-8 XML file, but I am having problems with the following string: Otivägen it is being outputted as Otivägen when i vim the file. Also when opened it IE i get " An invalid character was found in text content. Error processing resource " I have the following code: function fixEncoding($in_str) { $cur_encoding = mb_detect

Python3 - Cannot read docx, odt file - UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 10: invalid continuation byte

余生颓废 提交于 2019-12-23 16:26:05
问题 I am trying to split a large docx file into small files. For that when reading a file in python3.6 with the following code. with open('h.docx', 'r') as f: a = f.read() It throws this error. Traceback (most recent call last): File "<stdin>", line 2, in <module> File "/usr/local/lib/python3.6/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 10: invalid continuation byte h.docx

How to encode special characters for a POST with Spring/Roo

穿精又带淫゛_ 提交于 2019-12-23 15:48:42
问题 I'm using Spring/Roo for an app server, and need to be able to post some special characters. Specifically, characters like the Yen symbol, or Euro symbol. When I receive these characters on my server, and display them in console, they appear as "?". How can they be properly encoded and received? 回答1: There are a couple of possible failure points here. First, I'd check to see if the console supports the characters in question: if the default encoding used by the JVM does not support the

How should I decode bytes (using ASCII) without losing any “junk” bytes if xmlcharrefreplace and backslashreplace don't work?

微笑、不失礼 提交于 2019-12-23 15:33:10
问题 I have a network resource which returns me data that should (according to the specs) be an ASCII encoded string. But in some rare occasions, I get junk data. One resource for example returns b'\xd3PS-90AC' whereas another resource, for the same key returns b'PS-90AC' The first value contains a non-ASCII string. Clearly a violation of the spec, but that's unfortunately out of my control. None of us are 100% certain that this really is junk or data which should be kept. The application calling

Converting a string from utf8 to latin1 in NodeJS

流过昼夜 提交于 2019-12-23 15:23:46
问题 I'm using a Latin1 encoded DB and can't change it to UTF-8 meaning that I run into issues with certain application data. I'm using Tesseract to OCR a document (tesseract encodes in UTF-8) and tried to use iconv-lite; however, it creates a buffer and to convert that buffer into a string. But again, buffer to string conversion does not allow "latin1" encoding. I've read a bunch of questions/answers; however, all I get is setting client encoding and stuff like that. Any ideas? 回答1: You can

Java Apache FileUtils readFileToString and writeStringToFile problems

蹲街弑〆低调 提交于 2019-12-23 15:21:58
问题 I need to parse a java file (actually a .pdf) to an String and go back to a file. Between those process I'll apply some patches to the given string, but this is not important in this case. I've developed the following JUnit test case: String f1String=FileUtils.readFileToString(f1); File temp=File.createTempFile("deleteme", "deleteme"); FileUtils.writeStringToFile(temp, f1String); assertTrue(FileUtils.contentEquals(f1, temp)); This test converts a file to a string and writtes it back. However

PHPWord: Creating an Arabic right to left word document

﹥>﹥吖頭↗ 提交于 2019-12-23 13:00:40
问题 I'm trying to use PHPWord to create a word document that will include dynamic data pulled out from a MySQL database. The database has MySQL charset: UTF-8 Unicode (utf8) MySQL connection collation: utf8_unicode_ci and so does the table fields. Data is stored and previewed fine in HTML, however when creating the document with the arabic variables, the output in Word looks like Ø£Ø­ÙØ¯ ÙØ¨Ø§Ø±Ù اÙÙØ±Ù . $PHPWord = new PHPWord(); $document = $PHPWord->loadTemplate('templates/.../wtvr.docx');

How to properly set utf8 encoding with jdbc and MySQL?

左心房为你撑大大i 提交于 2019-12-23 12:51:55
问题 JDBC and MySQL work just fine in my project except when it comes down to accented characters. This is the URL I use to access the database: jdbc:mysql://localhost:3306/dbname?useUnicode=yes&characterEncoding=UTF-8 Suppose a resultSet = preparedStatement.executeQuery() , and then a System.out.println(resultSet.getString("text_with_accents")); . What's stored in the database is àèìòù (note that I've already set the right encoding in the database and all its tables), but what I get is ????? . Is

How may I bypass LWP's URL encoding for a GET request?

南楼画角 提交于 2019-12-23 12:28:46
问题 I'm talking to what seems to be a broken HTTP daemon and I need to make a GET request that includes a pipe | character in the URL. LWP::UserAgent escapes the pipe character before the request is sent. For example, a URL passed in as: https://hostname/url/doSomethingScript?ss=1234&activities=Lec1|01 is passed to the HTTP daemon as https://hostname/url/doSomethingScript?ss=1234&activities=Lec1%7C01 This is correct, but doesn't work with this broken server. How can I override or bypass the