lucene | 易学教程

ElasticSearch stat size_in_bytes different for identical indices

阅读更多关于 ElasticSearch stat size_in_bytes different for identical indices

问题 In ElasticSearch 5.6, I have created multiple indices of the same 15k documents. Specifically, 4 that all share the same mapping, settings, and content. 3 of the 4 had index sizes ~1.0 GB. index_1 of the 4 has a size of 52 MB. I've compared searches across the 4 indices, and index_1 returns less documents than the others for identical searches. I've seen anywhere from 1% to 80% less documents per query. At this point, I don't trust the docs.count or the store.size_in_bytes on index_1 or the

What's the point of Lucene NumericUtils.IntToPrefixCoded

阅读更多关于 What's the point of Lucene NumericUtils.IntToPrefixCoded

问题 I've been looking at Subtext's Lucene.Net implementation as a guide to do something similar with our websites. When Subtext index or search for a given post, it runs the ID through NumericUtils.IntToPrefixCoded. According to the Lucene docs, it does some shifting, but doesn't lose precision. So, what's the point? What does it do, and why? 回答1: You need to look at the class documentation, which explains it in more detail: To quickly execute range queries in Apache Lucene, a range is divided

how to get all docs with acts_as_solr

阅读更多关于 how to get all docs with acts_as_solr

问题 I'm doing something like this: Item.find_by_solr('name:ab*') and it says it returns 297 results: => #<ActsAsSolr::SearchResults:0xb6516858 @total_pages=1, @solr_data={:docs=>[doc1, doc2, doc3...]}, :max_score=>1.6935261, :total_pages=>1, :total=>297}, @current_page=1> Item.count_by_solr('name:ab*') also returns 297. Yet when iterate it only shows 10 items: Item.find_by_solr('reference_name:ab*').each do |i| puts i end I tried adding {:per_page=>80} and :limit=>:all but it still shows those 10

Solr: How distinguish between multiple entities imported through DIH

阅读更多关于 Solr: How distinguish between multiple entities imported through DIH

问题 When using DataImportHandler with SqlEntityProcessor, I want to have several definitions going into the same schema with different queries. How can I search both type of entities but also distinguish their source at the same time. Example: <document> <entity name="entity1" query="query1"> <field column="column1" name="column1" /> <field column="column2" name="column2" /> </entity> <entity name="entity2" query="query2"> <field column="column1" name="column1" /> <field column="column2" name=

UTF-8 characters not showing properly

阅读更多关于 UTF-8 characters not showing properly

问题 I am using Nutch 1.4 and solr 3.3.0 to crawl and index my site which is in French. My site used to be in iso8859-1. Currently I have 2 indexes under solr. In the first one I store my old pages (in iso8859-1) and in the second one I store my new pages (in utf-8). I use the same nutch configurations for both of the crawl jobs to get and index the old and the new pages on my site. I have not added any settings about charters encodings on my own ( i think). I am facing problem when searching the

Updating Solr Index when product data has changed

阅读更多关于 Updating Solr Index when product data has changed

问题 We are working on implementing Solr on e-commerce site. The site is continuously updated with a new data, either by updates made in existing product information or add new product altogether. We are using it on asp.net mvc3 application with solrnet. We are facing issue with indexing. We are currently doing commit using following: private static ISolrOperations<ProductSolr> solrWorker; public void ProductIndex() { //Check connection instance invoked or not if (solrWorker == null) { Startup

Return Elasticsearch highlight results in position order?

阅读更多关于 Return Elasticsearch highlight results in position order?

问题 I'm currently using the highlighting feature that elasticsearch offers in my query. However, the one thing I'm not quite clear on is about how the results are ordered. I would prefer they come back in the order that they appear in a paragraph instead of importance/score. This is so I can concatenate them with ... 's in the same order as they are in the original document (similar to Google results). However, they are currently returning in some weighted order based on best match? Is there a

Fieldable.tokenStreamValue() returns null for tokenized field

阅读更多关于 Fieldable.tokenStreamValue() returns null for tokenized field

问题 I use lucene for N-Gram matching. I set a field to be analyzed using an N-Gram analyzer. I want to see how the tokens resulting from the analysis look like to make sure the n-grams are being correctly computed. If I call the method Fieldable.tokenStreamValue() on the analyzed field of a document, I get null, while calling Fieldable.isTokenized() returns true. I must add that the results of querying are consistent with n-grams being correctly generated. Any explanations for this? I am

Indexing failed. Rolled back all changes. (Solr DataImport)

阅读更多关于 Indexing failed. Rolled back all changes. (Solr DataImport)

问题 When I try to run domain.com:8080/solr/dataimport?command=full-import , I get the error Indexing failed. Rolled back all changes. There's no additional error message to inform me what when wrong? Any suggestions? data-config.xml <dataConfig> <dataSource name="mysql" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/databasename" user="myusername" password="mypassword" /> <document> <entity name="posts" datasource="mysql" query="select id, title, description from posts" deltaQuery=

Sitecore + Lucene + QueryOccurance.Should not returning desired results

阅读更多关于 Sitecore + Lucene + QueryOccurance.Should not returning desired results

问题 I Am using Alex Shybas Advanced DatabaseCrawler and it is working beautifully... almost... I Am using for a carsales application in which you can search for a car using the following values Model Make Fuel Mileage Price Year (Registration date) I have multiple NumericRange queryies: -1000 - 0 (this is for those dealers, that do not want the price online. They write the price as -1) bottom to top ie. (10000 - 20000) This is what i want to sort by The are both in the same