sgml

SGML parser in Java?

三世轮回 提交于 2019-11-28 23:11:43
I'm looking for a parser in Java that can parse a document formatted in SGML. For duplicate monitors: I'm aware of the two other threads that discuss this topic: Parsing Java String with SGML Java SGML to XML conversion? But neither has a resolution, hence the new topic. For people that confuse XML with SGML: Please read this: http://www.w3.org/TR/NOTE-sgml-xml-971215#null (in short, there are enough subtle differences to at least make it unusable in it's vanilla form) For people who are fond of asking posters to Google it: I already did and the closest I could come up with was the widely

Is HTML a context-free language?

一世执手 提交于 2019-11-28 03:39:52
Reading some related questions made me think about the theoretical nature of HTML. I'm not talking about XHTML-like code here. I'm talking about stuff like this crazy piece of markup, which is perfectly valid HTML(!) <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"> <html<head> <title// <p ltr<span id=p></span</p> </> So given the enormous complexity that SGML injects here, is HTML a context-free language? Is it a formal language anyway? With a grammar? What about HTML5? I'm new to the concept of formal languages, so please bear with me. And yes, I have read the wikipedia article ;) Context

Definition of HTML whitespace rules?

笑着哭i 提交于 2019-11-28 01:05:53
I'm looking for this definition to make my HTML renderer conform a bit better. Currently it's guessing which whitespace to keep, which to collapse and what to throw. The SGML standard is hard to find and the HTML standard doesn't seem to treat the subject with the required depth for my needs. Currently my renderer parses the HTML into a tree and then does a recursive layout pass to position all the elements and their content. I'm experimenting with throwing some whitespace out in the parse stage, i.e. not emitting whitespace only text chunks in certain circumstances. Which kinda works for the

SGML parser .NET recommendations [closed]

余生长醉 提交于 2019-11-27 23:01:39
问题 In my C# project, I have been dealt with the task of parsing an SGML file and have tried, very naively, to use XmlReader, and this has led to some interesting revelations (i.e., the difference between SGML and well-formed XML, etc.) So I am thinking that I just need a good SGML parser which converts it to an XML file and go from there. In my search, I have found two SGML parsers that can integrate with my C# project: MSDN's SgmlReader, and James Clark's SP SGML parser. Any other

SGML parser in Java?

南楼画角 提交于 2019-11-27 14:35:31
问题 I'm looking for a parser in Java that can parse a document formatted in SGML. For duplicate monitors: I'm aware of the two other threads that discuss this topic: Parsing Java String with SGML Java SGML to XML conversion? But neither has a resolution, hence the new topic. For people that confuse XML with SGML: Please read this: http://www.w3.org/TR/NOTE-sgml-xml-971215#null (in short, there are enough subtle differences to at least make it unusable in it's vanilla form) For people who are fond

Parse SGML with Open Arbitrary Tags in Python 3

六月ゝ 毕业季﹏ 提交于 2019-11-27 14:04:16
I am trying to parse a file such as: http://www.sec.gov/Archives/edgar/data/1409896/000118143112051484/0001181431-12-051484.hdr.sgml I am using Python 3 and have been unable to find a solution with existing libraries to parse an SGML file with open tags. SGML allows implicitly closed tags. When attempting to parse the example file with LXML, XML, or beautiful soup I end up with implicitly closed tags being closed at the end of the file instead of at the end of line. For example: <COMPANY>Awesome Corp <FORM> 24-7 <ADDRESS> <STREET>101 PARSNIP LN <ZIP>31337 </ADDRESS> This ends up being

How can I stop empty XML elements self-closing using XmlDocument in C#?

假装没事ソ 提交于 2019-11-27 07:50:16
问题 Before I get jumped on by people saying the XML parser shouldn’t care if the elements are empty or self-closed, there is a reason why I can’t allow self-closed XML elements. The reason is that I’m actually working with SGML not XML and the SGML DTD I’m working with is very strict and doesn't allow it. What I have is several thousand SGML files which I’ve needed to run XSLT on. I’ve therefore had to convert the SGML to XML temporarily in order to apply the XSLT. I’ve then written a method that

Is HTML a context-free language?

一个人想着一个人 提交于 2019-11-27 05:10:23
问题 Reading some related questions made me think about the theoretical nature of HTML. I'm not talking about XHTML-like code here. I'm talking about stuff like this crazy piece of markup, which is perfectly valid HTML(!) <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"> <html<head> <title// <p ltr<span id=p></span</p> </> So given the enormous complexity that SGML injects here, is HTML a context-free language? Is it a formal language anyway? With a grammar? What about HTML5? I'm new to the

Where is the HTML5 Document Type Definition?

|▌冷眼眸甩不掉的悲伤 提交于 2019-11-26 17:33:05
The "old" HTML/XHTML standards have a DTD (Document Type Definition) defined for them: HTML 4.01 http://www.w3.org/TR/html401/sgml/dtd.html XHTML 1.0 http://www.w3.org/TR/xhtml1/dtds.html#a_dtd_XHTML-1.0-Strict This DTDs specify the rules for nesting elements - "which types of elements may appear in which types of elements". I made a diagram for XHTML 1.0 here (sorry, I no longer have that resource) I would like to update that diagram with a new version which also includes the new HTML5 elements. However, there doesn't seem to be a HTML5 DTD. It seems that the nesting rules are defined by the

Where is the HTML5 Document Type Definition?

|▌冷眼眸甩不掉的悲伤 提交于 2019-11-26 06:35:26
问题 The \"old\" HTML/XHTML standards have a DTD (Document Type Definition) defined for them: HTML 4.01 http://www.w3.org/TR/html401/sgml/dtd.html XHTML 1.0 http://www.w3.org/TR/xhtml1/dtds.html#a_dtd_XHTML-1.0-Strict This DTDs specify the rules for nesting elements - \"which types of elements may appear in which types of elements\". I made a diagram for XHTML 1.0 here (sorry, I no longer have that resource) I would like to update that diagram with a new version which also includes the new HTML5