sax

PHP SAX parser for HTML?

不打扰是莪最后的温柔 提交于 2019-12-01 21:48:16
问题 I need HTML SAX (not DOM!) parser for PHP able to process even invalid HTML code. The reason i need it is to filter user entered HTML (remove all attributes and tags except allowed ones) and truncate HTML content to specified length. Any ideas? 回答1: SAX was made to process valid XML and fail on invalid markup. Processing invalid HTML markup requires keeping more state than SAX parsers typically keep. I'm not aware of any SAX-like parser for HTML. Your best shot is to use to pass the HTML

Skipping nodes with sax

风格不统一 提交于 2019-12-01 21:16:48
问题 Is it possible to skip nodes when parsing and how, does this skippedEntity have anything to do with it? Consider this XML : <?xml version="1.0"?> <nutrition> <daily-values> <total-fat units="g">65</total-fat> <saturated-fat units="g">20</saturated-fat> <cholesterol units="mg">300</cholesterol> <sodium units="mg">2400</sodium> <carb units="g">300</carb> <fiber units="g">25</fiber> <protein units="g">50</protein> </daily-values> </nutrition> I want to skip "sodium" element 回答1: You could do

PHP SAX parser for HTML?

↘锁芯ラ 提交于 2019-12-01 21:11:06
I need HTML SAX (not DOM!) parser for PHP able to process even invalid HTML code. The reason i need it is to filter user entered HTML (remove all attributes and tags except allowed ones) and truncate HTML content to specified length. Any ideas? SAX was made to process valid XML and fail on invalid markup. Processing invalid HTML markup requires keeping more state than SAX parsers typically keep. I'm not aware of any SAX-like parser for HTML. Your best shot is to use to pass the HTML through tidy before and then use a XML parser, but this may defeat your purpose of using a SAX parser in the

Skipping nodes with sax

試著忘記壹切 提交于 2019-12-01 19:31:44
Is it possible to skip nodes when parsing and how, does this skippedEntity have anything to do with it? Consider this XML : <?xml version="1.0"?> <nutrition> <daily-values> <total-fat units="g">65</total-fat> <saturated-fat units="g">20</saturated-fat> <cholesterol units="mg">300</cholesterol> <sodium units="mg">2400</sodium> <carb units="g">300</carb> <fiber units="g">25</fiber> <protein units="g">50</protein> </daily-values> </nutrition> I want to skip "sodium" element You could do something like the following: import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory;

Efficient XSLT pipeline, with params, in Java

你。 提交于 2019-12-01 18:31:58
问题 The top answer to this question describes a technique to implement an efficient XSLT pipeline in Java: Efficient XSLT pipeline in Java (or redirecting Results to Sources) Unfortunately, while Transformer seems to expose an API for setting XSLT parameters, this does not seem to have any effect. For example, I have the following code: Transformer.java import javax.xml.transform.sax.SAXTransformerFactory; import javax.xml.transform.Templates; import javax.xml.transform.sax.TransformerHandler;

How can I get the text between tags using python SAX parser?

萝らか妹 提交于 2019-12-01 16:47:34
What I need is just get the text of the corresponding tag and persist it into database. Since the xml file is big (4.5GB) I'm using sax. I used the characters method to get the text and put it in a dictionary. However when I'm printing the text at the endElement method I'm getting a new line instead of the text. Here is my code: def characters(self,content): text = unescape(content)) self.map[self.tag]=text def startElement(self, name, attrs): self.tag = name def endElement (self, name) if (name=="sometag") print self.map[name] Thanks in advance. The text in the tag is chunked by the SAX

How can I get the text between tags using python SAX parser?

前提是你 提交于 2019-12-01 15:38:38
问题 What I need is just get the text of the corresponding tag and persist it into database. Since the xml file is big (4.5GB) I'm using sax. I used the characters method to get the text and put it in a dictionary. However when I'm printing the text at the endElement method I'm getting a new line instead of the text. Here is my code: def characters(self,content): text = unescape(content)) self.map[self.tag]=text def startElement(self, name, attrs): self.tag = name def endElement (self, name) if

XML SAX: Explain result in `qName` and `localName` in one example XML file

老子叫甜甜 提交于 2019-12-01 13:16:51
I am testing how to use SAXParser and understanding its component . Here is my XML file that I used to test: <?xml-stylesheet href="/externalflash/NASA_Detail/NASA_Detail.xsl" type="text/xsl"?> <rss version="2.0"> <channel> <title>NASA Image of the Day</title> <link>http://www.nasa.gov/multimedia/imagegallery/index.html</link> <description>The latest NASA "Image of the Day" image.</description> <language>en-us</language> <docs>http://blogs.law.harvard.edu/tech/rss</docs> <managingEditor>yvette.smith-1@nasa.gov</managingEditor> <webMaster>brian.dunbar@nasa.gov</webMaster> <item xmlns:java_code=

PULL解析XML和SAX解析的区别

眉间皱痕 提交于 2019-12-01 12:37:28
如果在一个 XML 文档中我们只需要前面一部分数据,但是使用 SAX 方式或DOM方式会对整个文档进行解析,尽管 XML 文档中后面的大部分数据我们其实都不需要解析,因此这样实际上就浪费了处理资源。使用PULL方式正合适。 Pull解析器和SAX解析器虽有区别但也有相似性。他们的区别为:SAX解析器的工作方式是自动将事件推入注册的事件处理器进行处理,因此你不能控制事件的处理主动结束;而Pull解析器的工作方式为允许你的应用程序代码主动从解析器中获取事件,正因为是主动获取事件,因此可以在满足了需要的条件后不再获取事件,结束解析。这是他们主要的区别。 而他们的相似性在运行方式上,Pull解析器也提供了类似SAX的事件,开始文档START_DOCUMENT和结束文档END_DOCUMENT,开始元素START_TAG和结束元素END_TAG,遇到元素内容TEXT等,但需要调用next() 方法提取它们(主动提取事件)。 Android系统中和Pull方式相关的包为org.xmlpull.v1,在这个包中提供了Pull解析器的工厂类XmlPullParserFactory和Pull解析器XmlPullParser,XmlPullParserFactory实例调用newPullParser方法创建XmlPullParser解析器实例

Efficient Parser for large XMLs

最后都变了- 提交于 2019-12-01 11:11:52
I have very large XML files to process. I want to convert them to readable PDFs with colors, borders, images, tables and fonts. I don't have a lot of resources in my machine, thus, I need my application to be very optimal addressing memory and processor. I did a humble research to make my mind about the technology to use but I could not decide what is the best programming language and API for my requirements. I believe DOM is not an option because it consumes a lot of memory, but, would Java with SAX parser fulfill my requirements? Some people also recommended Python for XML parsing. Is it