lucene

solr not tokenizing protected words

半腔热情 提交于 2019-12-22 10:28:36
问题 I have documents in Solr/Lucene (3.x) with a special copy field facet_headline in order to have an unstemmed field for faceting. Sometimes 2 ore more words are belong together, and this should be handled/counted as one word, for example "kim jong il". So the headline "Saturday: kim jong il had died" should be split into: Saturday kim jong il had died For this reason I decided to use protected words (protwords), where I add kim jong il . The schema.xml looks like this. <fieldType name="facet

Lucene search by URL

六眼飞鱼酱① 提交于 2019-12-22 10:10:29
问题 I'm storing a Document which has a URL field: Document doc = new Document(); doc.add(new Field("url", url, Field.Store.YES, Field.Index.NOT_ANALYZED)); doc.add(new Field("text", text, Field.Store.YES, Field.Index.ANALYZED)); doc.add(new Field("html", CompressionTools.compressString(html), Field.Store.YES)); I'd like to be able to find a Document by its URL, but I get 0 results: Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30) Query query = new QueryParser(LUCENE_VERSION, "url",

Installing solr and indexing mysql

烈酒焚心 提交于 2019-12-22 09:57:57
问题 Can anyone help me with Installation of solr and configuring it to mysql table.I Have tried almost all tutorials , i tried with Jetty , also tomcat.Still getting errors like Data Handler not defined or could not find solr.It's been a week , i am trying all day 回答1: In order to get solr running, (assuming that you've downloaded solr and extract it to a location), just navigate to the jetty folder. Under that there should be a start.jar . Just type in java -jar start.jar - this should start

Lucene.net and partial “starts with” phrase search

醉酒当歌 提交于 2019-12-22 09:56:44
问题 I'm looking to build an auto-complete textbox over a large quantity of city names. Search functionality is as follows: I want a "Starts with" search over a multi-word phrase. For example, if user has typed in "chicago he", only locations such as "Chicago Heights" need to be returned. I'm trying to use Lucene for this. I'm having issues understanding how this needs to be implemented. I've tried what I think is the approach that should work: I've indexed locations with KeywordAnalyzer (I've

Is it possible to create Java classes from JRuby and use them in Java?

前提是你 提交于 2019-12-22 08:20:05
问题 I'm trying to extend Lucene's Analyzer from JRuby and use it from java. A simple analyzer would look like: class MyAnalyzer < Java::OrgApacheLuceneAnalysis::Analyzer def TokenStream (file_name, reader) result = StandardTokenizer.new(Version::LUCENE_CURRENT, reader) result = LowerCaseFilter.new(result) result = LengthFilter.new(result, 3, 50) result = StopFilter.new(result, StandardAnalyzer.STOP_WORDS_SET) result = PorterStemFilter.new(result) result end end Then I compile it: jrubyc -c /home

How I find empty Solr document fields with lucene query

荒凉一梦 提交于 2019-12-22 08:15:51
问题 i have some documents like this: <doc> <str name="navTitle"/> <str name="title">Word 1</str> </doc> <doc> <str name="navTitle">Word 2</str> <str name="title">Word 3</str> </doc> and i will find all documents with an emtpy "navTitle" field! How is the lucene query for this? I trying " navTitle:'' " and I trying " navTitle:' ' ", but the Solr Admin Panal find nothing. Whats wrong at the query? 回答1: In the SolrQuerySytax page they says that you can use the following query to find all empty

Lucene 5 Sort problems (UninvertedReader and DocValues)

人走茶凉 提交于 2019-12-22 08:09:56
问题 I am working on a Search Engine built in Lucene 5.2.1, but I am having trouble with Sort updated option for Search. I get an error while searching with the Sort option: Exception in thread "main" java.lang.IllegalStateException: unexpected docvalues type NONE for field 'stars' (expected=NUMERIC). Use UninvertingReader or index with docvalues. at org.apache.lucene.index.DocValues.checkField(DocValues.java:208) at org.apache.lucene.index.DocValues.getNumeric(DocValues.java:227) at org.apache

Lucene Entity Extraction

江枫思渺然 提交于 2019-12-22 08:07:02
问题 Given a finite dictionary of entity terms, I'm looking for a way to do Entity Extraction with intelligent tagging using Lucene. Currently I've been able to use Lucene for: - Searching for complex phrases with some fuzzyness - Highlighting results However, I 'm not aware how to: -Get accurate offsets of the matched phrases -Do entity-specific annotaions per match(not just tags for every single hit) I have tried using the explain() method - but this only gives the terms in the query which got

Lucene QueryParser in multiple threads: synchronize or construct new each time?

无人久伴 提交于 2019-12-22 06:49:05
问题 I have a web application where users submit queries to a Lucene index. The queries are parsed by a Lucene QueryParser. I learned the hard way that QueryParser is not thread-safe. Is it better to use a single QueryParser instance, and synchronize on calls to its parse() method? Or is it better to construct a new instance for each query? (Or would I be better served by a pool of QueryParser s?) I know that in general questions like this depend on the particulars and require profiling, but maybe

Elasticsearch: match every position only once

佐手、 提交于 2019-12-22 06:34:15
问题 In my Elasticsearch index I have documents that have multiple tokens at the same position. I want to get a document back when I match at least one token at every position. The order of the tokens is not important. How can I accomplish that? I use Elasticsearch 0.90.5. Example: I index a document like this. { "field":"red car" } I use a synonym token filter that adds synonyms at the same positions as the original token. So now in the field, there are 2 positions: Position 1: "red" Position 2: