xml-parsing | 易学教程

XML worker using itext

阅读更多关于 XML worker using itext

问题 import java.io.FileOutputStream; import java.io.StringReader; import com.itextpdf.text.Document; import com.itextpdf.text.PageSize; import com.itextpdf.text.pdf.PdfWriter; import com.itextpdf.tool.xml.XMLWorkerHelper; public class HtmlToPDF2 { // itextpdf-5.4.1.jar http://sourceforge.net/projects/itext/files/iText/ // xmlworker-5.4.1.jar http://sourceforge.net/projects/xmlworker/files/ public static void main(String[] args ) { try { Document document = new Document(PageSize.LETTER); PdfWriter

How to get an attribute of an Element that is namespaced

阅读更多关于 How to get an attribute of an Element that is namespaced

I'm parsing an XML document that I receive from a vendor everyday and it uses namespaces heavily. I've minimized the problem to a minimal subset here: There are some elements I need to parse, all of which are children of an element with a specific attribute in it. I am able to use lxml.etree.Element.findall(TAG, root.nsmap) to find the candidate nodes whose attribute I need to check. I'm then trying to check the attribute of each of these Elements via the name I know it uses : which concretely here is ss:Name . If the value of that attribute is the desired value I'm going to dive deeper into

Suggestion to parse this XML in Java

阅读更多关于 Suggestion to parse this XML in Java

问题 Not new to Java; but relatively new to XML-parsing. I know a tiny bit about a lot of the XML tools out there, but not much about any of them. I am also not an XML-pro. My particular problem is this... I have been given an XML-document which I cannot modify and from which I need only to parse random bits of it into Java objects. Sheer speed is not much of a factor so long as it's reasonable. Likewise, memory-footprint need not be absolutely optimal either, just not insane. I only need to read

XML Parsing too slow!

阅读更多关于 XML Parsing too slow!

I wrote a java app to communicate with a web application using XML. After deployment, I found out it takes too long to parse the XML generated by the web application. For example, it takes about 2 minutes to login; the login information is included in the url. The web application does its processing and responds to the Java app whether the login was successful using XML returned. I used the standard java DOM parsing. Is there a way I can optimize this process so that activities can be faster? Using a standard XML parser a short message should be parsed in about one milli-second. Using a custom

Unicode Encoding Errors Python - Parsing XML can't encode a character (Star)

阅读更多关于 Unicode Encoding Errors Python - Parsing XML can't encode a character (Star)

问题 I am a beginner to Python and am currently parsing a web-based XML file from the eventful.com API however, I am receiving some unicode errors when retrieving certain elements of the data. I am able to retrieve 5 data elements without any problems which I want from the xml file, however then it terminates and produces the following error in the GAE error console: UnicodeEncodeError: 'ascii' codec can't encode character u'\u2605' in position 0: ordinal not in range(128) I know that the

Ignoring a particular attribute of a specific Node while comparing xml files using XMLUnit 2.X

阅读更多关于 Ignoring a particular attribute of a specific Node while comparing xml files using XMLUnit 2.X

问题 I have two XML files:  <note id="ignoreThisAttribute_1"> <to>Experts</to> <from>Matrix</from> <heading id="dontIgnoreThisAttribute_1">Reminder</heading> <body>Help me with this problem</body> </note>  <note id="ignoreThisAttribute_2"> <to>Experts</to> <from>Matrix</from> <heading id="dontIgnoreThisAttribute_2">Reminder</heading> <body>Help me with this problem<

Pugixml - parse namespace with prefix mapping and without prefix mappig

阅读更多关于 Pugixml - parse namespace with prefix mapping and without prefix mappig

问题 I have a client application that parses xml responses that are sent from 2 different servers. I call them server A and server B. Server A responds to one of the request with a response as below: <?xml version="1.0" encoding="UTF-8"?> <D:multistatus xmlns:D="DAV:"> <D:response> <D:href>/T12.txt</D:href> <D:propstat> <D:prop> <local-modification-time xmlns="urn:abc.com:webdrive">1389692809</local-modification-time> </D:prop> <D:status>HTTP/1.1 200 OK</D:status> </D:propstat> </D:response> </D

Switching from FOR loops in plpgsql to set-based SQL commands

阅读更多关于 Switching from FOR loops in plpgsql to set-based SQL commands

问题 I've got quite heavy query with FOR loop to rewrite and would like to do it simpler, using more SQL instead of plpgsql constructions. The query looks like: FOR big_xml IN SELECT unnest(xpath('//TAG1', my_xml)) LOOP str_xml = unnest(xpath('/TAG2/TYPE/text()', big_xml)); FOR single_xml IN SELECT unnest(xpath('/TAG2/single', big_xml)) LOOP CASE str_xml::INT WHEN 1 THEN INSERT INTO tab1(id, xml) VALUES (1, single_xml); WHEN 2 THEN INSERT INTO tab2(id, xml) VALUES (1, single_xml); WHEN 3 [...]

Java: How to prevent 'systemId' in EntityResolver#resolveEntity(String publicId, String systemId) from being absolutized to current working directory

阅读更多关于 Java: How to prevent 'systemId' in EntityResolver#resolveEntity(String publicId, String systemId) from being absolutized to current working directory

I want to parse the following XML document to resolve all entities in it: <!DOCTYPE doc SYSTEM 'mydoc.dtd'> <doc>&title;</doc> My EntityResolver is supposed to fetch the external entity with the given system ID from the database and then do the resolution, see below for an illustration: private static class MyEntityResolver { public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException { // At this point, systemId is always absolutized to the current working directory, // even though the XML document specified it as relative. // E.g. "file:///H:/mydoc.dtd

Java Regex or XML parser?

阅读更多关于 Java Regex or XML parser?

问题 I want to remove any tags such as hello <namespace:tag : a>hello</namespace:tag> to become hello hello What is the best way to do this if it is regex for some reason this is now working can anyone help? (<|</)[:]{1,2}[^</>]> edit: added 回答1: Definitely use an XML parser. Regex should not be used to parse *ML 回答2: You should not use regex for these purposes use a parser like lxml or BeautifulSoup >>> import lxml.html as lxht >>> myString = 'hello <namespace:tag : a>hello<