I have two table fields in a MySQL table. One is VARCHAR and is a \"headline\" for a classified (classifieds website). The other is TEXT field which contains the \"text\" fo
1. Schema
Your Solr schema is very much determined by your intended search behavior. In your schema.xml file, you'll see a bunch of choices like "text" and "string". They behave differently.
<fieldtype name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
The string field type is a literal string match. It would operate like ==
in a SQL statement.
<fieldtype name="text_ws" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldtype>
The text_ws field type does tokenization. However, a big difference in the text
field is the filters for stop-words and delimiters and lower-casing. Notice how these filters are designated for both the Lucene index and the Solr query. So when searching a text field, it will adapt the query terms using these filters to help find a match.
<fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter ..... />
<filter ..... />
<filter ..... />
</analyzer>
</fieldtype>
When indexing things like news stories, for example, you probably want to search for company names and headlines differently.
<field name="headline" type="text" />
<field name="coname" type="string" indexed="true" multiValued="false" omitNorms="true" />
The above example would allow you to do a search like &coname:Intel&headline:processor+specifications
and retrieve matches hitting exactly Intel stories.
If you wanted to search a range
2. Result Fields
You can defined a standard set of return fields in your RequestHandler
<requestHandler name="mumble" class="solr.DisMaxRequestHandler" >
<str name="fl">
category,coname,headline
</str>
</requestHandler>
You may also define the desired fields in your query string, using the fl
parameter.:
/select?indent=on&version=2.2&q=coname%3AIn*&start=0&rows=10&fl=coname%2Cid&qt=standard
You can also select ranges in your query terms using the field:[x TO *]
syntax. If you wanted to select certain ads by their date , you might build a query with
ad_date:[20100101 TO 20100201]
in your query terms. (There are many ways to search ranges, I'm presenting a method that uses integers instead of Date class.)