Word Frequency Counter in XSLT

纵然是瞬间 提交于 2019-12-02 00:47:30

Your $stopwords variable is now a single string; you want it to be a sequence of strings. You can do this in any of the following ways:

  • Change its declaration to

    <xsl:variable name="stopwords" 
      select="('a', 'about', 'an', 'are', 'as', 'at', 
               'be', 'by', 'for', 'from', 'how', 
               'I', 'in', 'is', 'it', 
               'of', 'on', 'or', 
               'that', 'the', 'this', 'to', 
               'was', 'what', 'when', 'where', 
               'who', 'will', 'with')"/>
    
  • Change its declaration to

    <xsl:variable name="stopwords" 
      select="tokenize('a about an are as at 
                        be by for from how I in is it 
                        of on or that the this to was 
                        what when where who will with',
                        '\s+')"/>
    
  • Read it from an external XML document named (e.g.) stoplist.xml, of the form

    <stop-list>
      <p>This is a sample stop list [further description ...]</p>
      <w>a</w>
      <w>about</w>
      ...
    </stop-list>
    

    and then load it, e.g. with

    <xsl:variable name="stopwords"
      select="document('stopwords.xml')//w/string()"/>
    

You are comparing the current word with the entire list of all stop words, instead you should check if the current word is contained in the list of stop words:

not(contains(concat($stopwords,' '),concat(.,' '))

The concatenation of a space is needed to avoid partial matches - e.g. prevent 'abo' to match 'about'.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!