xml-parsing

XML worker using itext

柔情痞子 提交于 2019-12-04 05:24:55
问题 import java.io.FileOutputStream; import java.io.StringReader; import com.itextpdf.text.Document; import com.itextpdf.text.PageSize; import com.itextpdf.text.pdf.PdfWriter; import com.itextpdf.tool.xml.XMLWorkerHelper; public class HtmlToPDF2 { // itextpdf-5.4.1.jar http://sourceforge.net/projects/itext/files/iText/ // xmlworker-5.4.1.jar http://sourceforge.net/projects/xmlworker/files/ public static void main(String[] args ) { try { Document document = new Document(PageSize.LETTER); PdfWriter

How to get an attribute of an Element that is namespaced

大憨熊 提交于 2019-12-04 05:21:25
I'm parsing an XML document that I receive from a vendor everyday and it uses namespaces heavily. I've minimized the problem to a minimal subset here: There are some elements I need to parse, all of which are children of an element with a specific attribute in it. I am able to use lxml.etree.Element.findall(TAG, root.nsmap) to find the candidate nodes whose attribute I need to check. I'm then trying to check the attribute of each of these Elements via the name I know it uses : which concretely here is ss:Name . If the value of that attribute is the desired value I'm going to dive deeper into

Suggestion to parse this XML in Java

拟墨画扇 提交于 2019-12-04 05:04:27
问题 Not new to Java; but relatively new to XML-parsing. I know a tiny bit about a lot of the XML tools out there, but not much about any of them. I am also not an XML-pro. My particular problem is this... I have been given an XML-document which I cannot modify and from which I need only to parse random bits of it into Java objects. Sheer speed is not much of a factor so long as it's reasonable. Likewise, memory-footprint need not be absolutely optimal either, just not insane. I only need to read

XML Parsing too slow!

心不动则不痛 提交于 2019-12-04 05:02:42
I wrote a java app to communicate with a web application using XML. After deployment, I found out it takes too long to parse the XML generated by the web application. For example, it takes about 2 minutes to login; the login information is included in the url. The web application does its processing and responds to the Java app whether the login was successful using XML returned. I used the standard java DOM parsing. Is there a way I can optimize this process so that activities can be faster? Using a standard XML parser a short message should be parsed in about one milli-second. Using a custom

Unicode Encoding Errors Python - Parsing XML can't encode a character (Star)

谁都会走 提交于 2019-12-04 04:59:02
问题 I am a beginner to Python and am currently parsing a web-based XML file from the eventful.com API however, I am receiving some unicode errors when retrieving certain elements of the data. I am able to retrieve 5 data elements without any problems which I want from the xml file, however then it terminates and produces the following error in the GAE error console: UnicodeEncodeError: 'ascii' codec can't encode character u'\u2605' in position 0: ordinal not in range(128) I know that the

Ignoring a particular attribute of a specific Node while comparing xml files using XMLUnit 2.X

柔情痞子 提交于 2019-12-04 04:55:12
问题 I have two XML files: <!------------------------File1---------------------------------> <note id="ignoreThisAttribute_1"> <to>Experts</to> <from>Matrix</from> <heading id="dontIgnoreThisAttribute_1">Reminder</heading> <body>Help me with this problem</body> </note> <!------------------------File2---------------------------------> <note id="ignoreThisAttribute_2"> <to>Experts</to> <from>Matrix</from> <heading id="dontIgnoreThisAttribute_2">Reminder</heading> <body>Help me with this problem<

Pugixml - parse namespace with prefix mapping and without prefix mappig

社会主义新天地 提交于 2019-12-04 04:52:24
问题 I have a client application that parses xml responses that are sent from 2 different servers. I call them server A and server B. Server A responds to one of the request with a response as below: <?xml version="1.0" encoding="UTF-8"?> <D:multistatus xmlns:D="DAV:"> <D:response> <D:href>/T12.txt</D:href> <D:propstat> <D:prop> <local-modification-time xmlns="urn:abc.com:webdrive">1389692809</local-modification-time> </D:prop> <D:status>HTTP/1.1 200 OK</D:status> </D:propstat> </D:response> </D

Switching from FOR loops in plpgsql to set-based SQL commands

三世轮回 提交于 2019-12-04 04:47:54
问题 I've got quite heavy query with FOR loop to rewrite and would like to do it simpler, using more SQL instead of plpgsql constructions. The query looks like: FOR big_xml IN SELECT unnest(xpath('//TAG1', my_xml)) LOOP str_xml = unnest(xpath('/TAG2/TYPE/text()', big_xml)); FOR single_xml IN SELECT unnest(xpath('/TAG2/single', big_xml)) LOOP CASE str_xml::INT WHEN 1 THEN INSERT INTO tab1(id, xml) VALUES (1, single_xml); WHEN 2 THEN INSERT INTO tab2(id, xml) VALUES (1, single_xml); WHEN 3 [...]

Java: How to prevent 'systemId' in EntityResolver#resolveEntity(String publicId, String systemId) from being absolutized to current working directory

做~自己de王妃 提交于 2019-12-04 04:33:10
I want to parse the following XML document to resolve all entities in it: <!DOCTYPE doc SYSTEM 'mydoc.dtd'> <doc>&title;</doc> My EntityResolver is supposed to fetch the external entity with the given system ID from the database and then do the resolution, see below for an illustration: private static class MyEntityResolver { public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException { // At this point, systemId is always absolutized to the current working directory, // even though the XML document specified it as relative. // E.g. "file:///H:/mydoc.dtd

Java Regex or XML parser?

会有一股神秘感。 提交于 2019-12-04 04:24:24
问题 I want to remove any tags such as <p>hello <namespace:tag : a>hello</namespace:tag></p> to become <p> hello hello </p> What is the best way to do this if it is regex for some reason this is now working can anyone help? (<|</)[:]{1,2}[^</>]> edit: added 回答1: Definitely use an XML parser. Regex should not be used to parse *ML 回答2: You should not use regex for these purposes use a parser like lxml or BeautifulSoup >>> import lxml.html as lxht >>> myString = '<p>hello <namespace:tag : a>hello<