How to parse XML using the SAX parser

前端 未结 3 689
后悔当初
后悔当初 2020-11-22 11:02

I\'m following this tutorial.

It works great but I would like it to return an array with all the strings instead of a single string with the last element.

An

3条回答
  •  北恋
    北恋 (楼主)
    2020-11-22 12:00

    So you want to build a XML parser to parse a RSS feed like this one.

    
    
        MyTitle
        http://myurl.com
        MyDescription
        SomeDate
        http://someurl.com
        SomeLanguage
    
        
            TitleOne
            
            http://linktoarticle.com
        
    
        
            TitleTwo
            
            http://linktoanotherarticle.com
        
    
    
    
    

    Now you have two SAX implementations you can work with. Either you use the org.xml.sax or the android.sax implementation. I'm going to explain the pro's and con's of both after posting a short hander example.

    android.sax Implementation

    Let's start with the android.sax implementation.

    You have first have to define the XML structure using the RootElement and Element objects.

    In any case I would work with POJOs (Plain Old Java Objects) which would hold your data. Here would be the POJOs needed.

    Channel.java

    public class Channel implements Serializable {
    
        private Items items;
        private String title;
        private String link;
        private String description;
        private String lastBuildDate;
        private String docs;
        private String language;
    
        public Channel() {
            setItems(null);
            setTitle(null);
            // set every field to null in the constructor
        }
    
        public void setItems(Items items) {
            this.items = items;
        }
    
        public Items getItems() {
            return items;
        }
    
        public void setTitle(String title) {
            this.title = title;
        }
    
        public String getTitle() {
            return title;
        }
        // rest of the class looks similar so just setters and getters
    }
    

    This class implements the Serializable interface so you can put it into a Bundle and do something with it.

    Now we need a class to hold our items. In this case I'm just going to extend the ArrayList class.

    Items.java

    public class Items extends ArrayList {
    
        public Items() {
            super();
        }
    
    }
    

    Thats it for our items container. We now need a class to hold the data of every single item.

    Item.java

    public class Item implements Serializable {
    
        private String title;
        private String description;
        private String link;
    
        public Item() {
            setTitle(null);
            setDescription(null);
            setLink(null);
        }
    
        public void setTitle(String title) {
            this.title = title;
        }
    
        public String getTitle() {
            return title;
        }
    
        // same as above.
    
    }
    

    Example:

    public class Example extends DefaultHandler {
    
        private Channel channel;
        private Items items;
        private Item item;
    
        public Example() {
            items = new Items();
        }
    
        public Channel parse(InputStream is) {
            RootElement root = new RootElement("rss");
            Element chanElement = root.getChild("channel");
            Element chanTitle = chanElement.getChild("title");
            Element chanLink = chanElement.getChild("link");
            Element chanDescription = chanElement.getChild("description");
            Element chanLastBuildDate = chanElement.getChild("lastBuildDate");
            Element chanDocs = chanElement.getChild("docs");
            Element chanLanguage = chanElement.getChild("language");
    
            Element chanItem = chanElement.getChild("item");
            Element itemTitle = chanItem.getChild("title");
            Element itemDescription = chanItem.getChild("description");
            Element itemLink = chanItem.getChild("link");
    
            chanElement.setStartElementListener(new StartElementListener() {
                public void start(Attributes attributes) {
                    channel = new Channel();
                }
            });
    
            // Listen for the end of a text element and set the text as our
            // channel's title.
            chanTitle.setEndTextElementListener(new EndTextElementListener() {
                public void end(String body) {
                    channel.setTitle(body);
                }
            });
    
            // Same thing happens for the other elements of channel ex.
    
            // On every  tag occurrence we create a new Item object.
            chanItem.setStartElementListener(new StartElementListener() {
                public void start(Attributes attributes) {
                    item = new Item();
                }
            });
    
            // On every  tag occurrence we add the current Item object
            // to the Items container.
            chanItem.setEndElementListener(new EndElementListener() {
                public void end() {
                    items.add(item);
                }
            });
    
            itemTitle.setEndTextElementListener(new EndTextElementListener() {
                public void end(String body) {
                    item.setTitle(body);
                }
            });
    
            // and so on
    
            // here we actually parse the InputStream and return the resulting
            // Channel object.
            try {
                Xml.parse(is, Xml.Encoding.UTF_8, root.getContentHandler());
                return channel;
            } catch (SAXException e) {
                // handle the exception
            } catch (IOException e) {
                // handle the exception
            }
    
            return null;
        }
    
    }
    

    Now that was a very quick example as you can see. The major advantage of using the android.sax SAX implementation is that you can define the structure of the XML you have to parse and then just add an event listener to the appropriate elements. The disadvantage is that the code get quite repeating and bloated.

    org.xml.sax Implementation

    The org.xml.sax SAX handler implementation is a bit different.

    Here you don't specify or declare you XML structure but just listening for events. The most widely used ones are following events:

    • Document Start
    • Document End
    • Element Start
    • Element End
    • Characters between Element Start and Element End

    An example handler implementation using the Channel object above looks like this.

    Example

    public class ExampleHandler extends DefaultHandler {
    
        private Channel channel;
        private Items items;
        private Item item;
        private boolean inItem = false;
    
        private StringBuilder content;
    
        public ExampleHandler() {
            items = new Items();
            content = new StringBuilder();
        }
    
        public void startElement(String uri, String localName, String qName, 
                Attributes atts) throws SAXException {
            content = new StringBuilder();
            if(localName.equalsIgnoreCase("channel")) {
                channel = new Channel();
            } else if(localName.equalsIgnoreCase("item")) {
                inItem = true;
                item = new Item();
            }
        }
    
        public void endElement(String uri, String localName, String qName) 
                throws SAXException {
            if(localName.equalsIgnoreCase("title")) {
                if(inItem) {
                    item.setTitle(content.toString());
                } else {
                    channel.setTitle(content.toString());
                }
            } else if(localName.equalsIgnoreCase("link")) {
                if(inItem) {
                    item.setLink(content.toString());
                } else {
                    channel.setLink(content.toString());
                }
            } else if(localName.equalsIgnoreCase("description")) {
                if(inItem) {
                    item.setDescription(content.toString());
                } else {
                    channel.setDescription(content.toString());
                }
            } else if(localName.equalsIgnoreCase("lastBuildDate")) {
                channel.setLastBuildDate(content.toString());
            } else if(localName.equalsIgnoreCase("docs")) {
                channel.setDocs(content.toString());
            } else if(localName.equalsIgnoreCase("language")) {
                channel.setLanguage(content.toString());
            } else if(localName.equalsIgnoreCase("item")) {
                inItem = false;
                items.add(item);
            } else if(localName.equalsIgnoreCase("channel")) {
                channel.setItems(items);
            }
        }
    
        public void characters(char[] ch, int start, int length) 
                throws SAXException {
            content.append(ch, start, length);
        }
    
        public void endDocument() throws SAXException {
            // you can do something here for example send
            // the Channel object somewhere or whatever.
        }
    
    }
    

    Now to be honest I can't really tell you any real advantage of this handler implementation over the android.sax one. I can however tell you the disadvantage which should be pretty obvious by now. Take a look at the else if statement in the startElement method. Due to the fact that we have the tags </code>, <code>link</code> and <code>description</code> we have to track there in the XML structure we are at the moment. That is if we encounter a <code><item></code> starting tag we set the <code>inItem</code> flag to <code>true</code> to ensure that we map the correct data to the correct object and in the <code>endElement</code> method we set that flag to <code>false</code> if we encounter a <code></item></code> tag. To signalize that we are done with that item tag.</p> <p>In this example it is pretty easy to manage that but having to parse a more complex structure with repeating tags in different levels becomes tricky. There you'd have to either use Enums for example to set your current state and a lot of switch/case statemenets to check where you are or a more elegant solution would be some kind of tag tracker using a tag stack.</p> </p> <div class="appendcontent"> </div> </div> <div class="jieda-reply"> <span class="jieda-zan button_agree" type="zan" data-id='107943'> <i class="iconfont icon-zan"></i> <em>0</em> </span> <span type="reply" class="showpinglun" data-id="107943"> <i class="iconfont icon-svgmoban53"></i> 讨论(0) </span> <div class="jieda-admin"> </div> <div class="noreplaytext bb"> <center><div> <a href="https://www.e-learn.cn/qa/q-26252.html"> 查看其它3个回答 </a> </div></center> </div> </div> <div class="comments-mod " style="display: none; float:none;padding-top:10px;" id="comment_107943"> <div class="areabox clearfix"> <form class="layui-form" action=""> <div class="layui-form-item"> <label class="layui-form-label" style="padding-left:0px;width:60px;">发布评论:</label> <div class="layui-input-block" style="margin-left:90px;"> <input type="text" placeholder="不少于5个字" AUTOCOMPLETE="off" class="comment-input layui-input" name="content" /> <input type='hidden' value='0' name='replyauthor' /> </div> <div class="mar-t10"><span class="fr layui-btn layui-btn-sm addhuidapinglun" data-id="107943">提交评论 </span></div> </div> </form> </div> <hr> <ul class="my-comments-list nav"> <li class="loading"> <img src='https://www.e-learn.cn/qa/static/css/default/loading.gif' align='absmiddle' />  加载中... </li> </ul> </div> </li> </ul> <div class="layui-form layui-form-pane"> <form id="huidaform" name="answerForm" method="post"> <div class="layui-form-item layui-form-text"> <a name="comment"></a> <div class="layui-input-block"> <script type="text/javascript" src="https://www.e-learn.cn/qa/static/js/neweditor/ueditor.config.js"></script> <script type="text/javascript" src="https://www.e-learn.cn/qa/static/js/neweditor/ueditor.all.js"></script> <script type="text/plain" id="editor" name="content" style="width:100%;height:200px;"></script> <script type="text/javascript"> var isueditor=1; var editor = UE.getEditor('editor',{ //这里可以选择自己需要的工具按钮名称,此处仅选择如下五个 toolbars:[['source','fullscreen', '|', 'undo', 'redo', '|', 'bold', 'italic', 'underline', 'fontborder', 'strikethrough', 'removeformat', 'formatmatch', 'autotypeset', 'blockquote', 'pasteplain', '|', 'forecolor', 'backcolor', 'insertorderedlist', 'insertunorderedlist', 'selectall', 'cleardoc', '|', 'rowspacingtop', 'rowspacingbottom', 'lineheight', '|', 'customstyle', 'paragraph', 'fontfamily', 'fontsize', '|', 'indent', '|', 'justifyleft', 'justifycenter', 'justifyright', 'justifyjustify', '|', 'link', 'unlink', 'anchor', '|', 'simpleupload', 'insertimage', 'scrawl', 'insertvideo', 'attachment', 'map', 'insertcode', '|', 'horizontal', '|', 'preview', 'searchreplace', 'drafts']], initialContent:'', //关闭字数统计 wordCount:false, zIndex:2, //关闭elementPath elementPathEnabled:false, //默认的编辑区域高度 initialFrameHeight:250 //更多其他参数,请参考ueditor.config.js中的配置项 //更多其他参数,请参考ueditor.config.js中的配置项 }); editor.ready(function() { editor.setDisabled(); }); $("#editor").find("*").css("max-width","362px"); </script> </div> </div> <div class="layui-form-item"> <label for="L_vercode" class="layui-form-label">验证码</label> <div class="layui-input-inline"> <input type="text" id="code" name="code" value="" required lay-verify="required" placeholder="图片验证码" autocomplete="off" class="layui-input"> </div> <div class="layui-form-mid"> <span style="color: #c00;"><img class="hand" src="https://www.e-learn.cn/qa/user/code.html" onclick="javascript:updatecode();" id="verifycode"><a class="changecode" href="javascript:updatecode();"> 看不清?</a></span> </div> </div> <div class="layui-form-item"> <input type="hidden" value="26252" id="ans_qid" name="qid"> <input type="hidden" id="tokenkey" name="tokenkey" value=''/> <input type="hidden" value="How to parse XML using the SAX parser" id="ans_title" name="title"> <div class="layui-btn layui-btn-disabled" id="ajaxsubmitasnwer" >提交回复</div> </div> </form> </div> </div> <input type="hidden" value="26252" id="adopt_qid" name="qid" /> <input type="hidden" id="adopt_answer" value="0" name="aid" /> </div> <div class="layui-col-md4"> <!-- 热门讨论问题 --> <dl class="fly-panel fly-list-one"> <dt class="fly-panel-title">热议问题</dt> <!-- 本周热门讨论问题显示10条-->