lucene | 易学教程

Is using a load balancer with ElasticSearch unnecessary?

阅读更多关于 Is using a load balancer with ElasticSearch unnecessary?

问题 I have a cluster of 3 ElasticSearch nodes running on AWS EC2. These nodes are setup using OpsWorks/Chef. My intent is to design this cluster to be very resilient and elastic (nodes can come in and out when needed). From everything I've read about ElasticSearch, it seems like no one recommends putting a load balancer in front of the cluster; instead, it seems like the recommendation is to do one of two things: Point your client at the URL/IP of one node, let ES do the load balancing for you

SQL Server 2008 Full Text Search (FTS) versus Lucene.NET

阅读更多关于 SQL Server 2008 Full Text Search (FTS) versus Lucene.NET

问题 I know there have been questions in the past about SQL 2005 versus Lucene.NET but since 2008 came out and they made a lot of changes to it and was wondering if anyone can give me pros/cons (or link to an article). 回答1: I built a medium-size knowledge base (maybe 2GB of indexed text) on top of SQL Server 2005's FTS in 2006, and have now moved it to 2008's iFTS. Both situations have worked well for me, but the move from 2005 to 2008 was actually an improvement for me. My situation was NOT like

Understanding Segments in Elasticsearch

阅读更多关于 Understanding Segments in Elasticsearch

问题 I was under the assumption that each shard in Elasticsearch is an index. But I read somewhere that each segment is a Lucene index. What exactly is a segment? How does it effect search performance? I have indices that reach around 450GB in size everyday (I create a new one everyday) with default Elasticsearch settings. When I execute curl -XPOST "http://localhost:9200/logstash-2013.03.0$i_optimize?max_num_segments=1" , I get num_committed_segments=11 and num_search_segments=11 . Shouldn't the

I'm trying to index files in a document through SOLR and lucene..

阅读更多关于 I'm trying to index files in a document through SOLR and lucene..

问题 as i said in the title: i am using Java, but when I run the code on Eclipse, i get the following error.. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/lucene/analysis/util/ResourceLoader at Indexer.getIndexWriter(Indexer.java:38) at Indexer.rebuildIndexes(Indexer.java:73) at SolrIndexer.main(SolrIndexer.java:23) Caused by: java.lang.ClassNotFoundException: org.apache.lucene.analysis.util.ResourceLoader at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java

Can we assign more then one template ID in templateId (like we do in relatedId), while searching with Lucene?

阅读更多关于 Can we assign more then one template ID in templateId (like we do in relatedId), while searching with Lucene?

问题 I have Five check boxes Search All Template 1 Template 2 Template 3 Template 4 If user selects Search All , then simply we can pass index name and get result, if user selects one of template specific check box, again simply we can do by passing template name, but if any of two templates specific check box(or may be three) are checked, then? Can we pipe-separate templateIDs? 回答1: You may need to change the method in the Advanced Database Crawler to handle the GUIDs of templates passed in. The

Updating Solr Schema

阅读更多关于 Updating Solr Schema

问题 I am new to Solr and I'm curious what the procedure is for changing/updating the schema? I noticed that I can ADD new fields easily without causing any issues, but any time that I've had to UPDATE a field, it's caused issues. Due to the amount of data ingested into the system, I will not be able to retain the original data that was used to generate the add/doc queries to solr, so I'll be unable to simply re-index everything when a change occurs. For instance, I am looking to change an

Neo4j: Lucene phrase matching using Cypher (fuzzy)

阅读更多关于 Neo4j: Lucene phrase matching using Cypher (fuzzy)

问题 In Lucene, a Phrase is a group of words surrounded by double quotes such as "hello dolly". I would like to be able to do the CYPHER equivalent of this Lucene fuzzy query: "hello dolly"~0.1 This finds my "hello dolly" node: START n=node:node_auto_index("name:\"hello dolly\"~0.1") RETURN n This doesn't: START n=node:node_auto_index("name:\"hella dolly\"~0.1") RETURN n Splitting the search phrase by whitespace into Single Terms does work: START n=node:node_auto_index("name:hella~0.1 AND name

How to index pdf, ppt, xl files in lucene (java based or python or php any of these is fine)?

阅读更多关于 How to index pdf, ppt, xl files in lucene (java based or python or php any of these is fine)?

问题 Also I want to know how to add meta data while indexing so that i can boost some parameters 回答1: Lucene indexes text not files - you'll need some other process for extracting the text out of the file and running Lucene over that. 回答2: There are several frameworks for extracting text suitable for Lucene indexing from rich text files (pdf, ppt etc.) One of them is Apache Tika, a sub-project of Lucene. Apache POI is a more general document handling project inside Apache. There are also some

In Lucene, why do my boosted and unboosted documents get the same score?

阅读更多关于 In Lucene, why do my boosted and unboosted documents get the same score?

问题 At index time I am boosting certain document in this way: if (myCondition) { document.SetBoost(1.2f); } But at search time documents with all the exact same qualities but some passing and some failing myCondition all end up having the same score. And here is the search code: BooleanQuery booleanQuery = new BooleanQuery(); booleanQuery.Add(new TermQuery(new Term(FieldNames.HAS_PHOTO, "y")), BooleanClause.Occur.MUST); booleanQuery.Add(new TermQuery(new Term(FieldNames.AUTHOR_TYPE, AuthorTypes

Does Elasticsearch keep an order of multi-value fields?

阅读更多关于 Does Elasticsearch keep an order of multi-value fields?

问题 Does Elasticsearch keep an order of multi-value fields? I.e. if I've put following values into fields: { "values": ["one", "two", "three"], "values_original": ["1", "2", "3"] } (Given that fields are not analyzed) Can I be sure that the contents of lists will always be returned in the same order I put it there? In the example above, I want to make sure that "one" on first position in "values" will always correspond to "1" in "values_original" etc. I could keep it also as nested objects, i.e.