How to determine field-type for SOLR indexing?

后端 未结 1 506
长发绾君心
长发绾君心 2020-12-14 03:07

I have two table fields in a MySQL table. One is VARCHAR and is a \"headline\" for a classified (classifieds website). The other is TEXT field which contains the \"text\" fo

相关标签:
1条回答
  • 2020-12-14 03:40

    1. Schema

    Your Solr schema is very much determined by your intended search behavior. In your schema.xml file, you'll see a bunch of choices like "text" and "string". They behave differently.

    <fieldtype name="string" class="solr.StrField" sortMissingLast="true"     omitNorms="true"/>
    

    The string field type is a literal string match. It would operate like == in a SQL statement.

    <fieldtype name="text_ws"   class="solr.TextField"          positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
    </fieldtype>
    

    The text_ws field type does tokenization. However, a big difference in the text field is the filters for stop-words and delimiters and lower-casing. Notice how these filters are designated for both the Lucene index and the Solr query. So when searching a text field, it will adapt the query terms using these filters to help find a match.

    <fieldtype name="text"      class="solr.TextField"  positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter ..... />
        <filter ..... />
        <filter ..... />
      </analyzer>
    </fieldtype>
    

    When indexing things like news stories, for example, you probably want to search for company names and headlines differently.

    <field name="headline" type="text" />
    <field name="coname" type="string" indexed="true" multiValued="false" omitNorms="true" />
    

    The above example would allow you to do a search like &coname:Intel&headline:processor+specifications and retrieve matches hitting exactly Intel stories.

    If you wanted to search a range

    2. Result Fields

    You can defined a standard set of return fields in your RequestHandler

    <requestHandler name="mumble" class="solr.DisMaxRequestHandler" >
        <str name="fl">
            category,coname,headline
        </str>
    </requestHandler>
    

    You may also define the desired fields in your query string, using the fl parameter.:

    /select?indent=on&version=2.2&q=coname%3AIn*&start=0&rows=10&fl=coname%2Cid&qt=standard
    

    You can also select ranges in your query terms using the field:[x TO *] syntax. If you wanted to select certain ads by their date , you might build a query with

    ad_date:[20100101 TO 20100201]
    

    in your query terms. (There are many ways to search ranges, I'm presenting a method that uses integers instead of Date class.)

    0 讨论(0)
提交回复
热议问题