lucene | 易学教程

How to make sure Solr/Lucene won't die with java.lang.OutOfMemoryError?

阅读更多关于 How to make sure Solr/Lucene won't die with java.lang.OutOfMemoryError?

问题 I'm really puzzled why it keeps dying with java.lang.OutOfMemoryError during indexing even though it has a few GBs of memory. Is there a fundamental reason why it needs manual tweaking of config files / jvm parameters instead of it just figuring out how much memory is available and limiting itself to that? No other programs except Solr ever have this kind of problem. Yes, I can keep tweaking JVM heap size every time such crashes happen, but this is all so backwards. Here's stack trace of the

What are segments in Lucene?

阅读更多关于 What are segments in Lucene?

问题 What are segments in Lucene ? What are the benefits of segments? 回答1: The Lucene index is split into smaller chunks called segments. Each segment is its own index. Lucene searches all of them in sequence. A new segment is created when a new writer is opened and when a writer commits or is closed. The advantages of using this system are that you never have to modify the files of a segment once it is created. When you are adding new documents in your index, they are added to the next segment.

How to fix: Error CREATEing SolrCore 'gettingstarted': Unable to create core

阅读更多关于 How to fix: Error CREATEing SolrCore 'gettingstarted': Unable to create core

问题 I'm getting this error when I try to create a new core in solr. root@ubuntu:/opt/solr# bin/solr create -c gettingstarted -n data_driven_schema_configs Setup new core instance directory: /var/solr/data/gettingstarted Creating new core 'gettingstarted' using command: http://localhost:8983/solr/admin/cores?action=CREATE&name=gettingstarted&instanceDir=gettingstarted Failed to create core 'gettingstarted' due to: Error CREATEing SolrCore 'gettingstarted': Unable to create core [gettingstarted]

lucene语法

阅读更多关于 lucene语法

【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 字段也可以按页面左侧显示的字段搜索限定字段全文搜索： field:value 精确搜索：关键字加上双引号 filed:"value" http.code:404 搜索http状态码为404的文档字段本身是否存在 _exists_:http ：返回结果中需要有http字段 _missing_:http ：不能含有http字段通配符 ? 匹配单个字符 * 匹配0到多个字符 kiba?a , el*search ? * 不能用作第一个字符，例如： ?text *text 正则 es支持部分正则功能,性能较差 name:/joh?n(ath[oa]n)/ 模糊搜索 quikc~ brwn~ foks~ ~ :在一个单词后面加上 ~ 启用模糊搜索，可以搜到一些拼写错误的单词 first~ 这种也能匹配到 frist 还可以设置编辑距离（整数），指定需要多少相似度 cromm~1 会匹配到 from 和 chrome 默认2，越大越接近搜索的原始值，设置为1基本能搜到80%拼写错误的单词近似搜索在短语后面加上 ~ ，可以搜到被隔开或顺序不同的单词 "where select"~5 表示 select 和 where 中间可以隔着5个单词，可以搜到 select password from users

Extract tf-idf vectors with lucene

阅读更多关于 Extract tf-idf vectors with lucene

问题 I have indexed a set of documents using lucene. I also have stored DocumentTermVector for each document content. I wrote a program and got the term frequency vector for each document, but how can I get tf-idf vector of each document? Here is my code that outputs term frequencies in each document: Directory dir = FSDirectory.open(new File(indexDir)); IndexReader ir = IndexReader.open(dir); for (int docNum=0; docNum<ir.numDocs(); docNum++) { System.out.println(ir.document(docNum).getField(

Is MongoDB a valid alternative to relational db + lucene?

阅读更多关于 Is MongoDB a valid alternative to relational db + lucene?

问题 On a new project I need a hard use of lucene for a searcher implementation. This searcher will be a very important (and big) piece of the project. Is valid or convenient replacing Relational Database + Lucene with MongoDb? edit: Ok, I will clarify: I'm not asking about risk, I can pay that price in this project. My point is: Is MongoDB oriented to this kind of thing? Can I make a full search engine with the same perfomance as I can get on Lucene?. A friend point me out MongoDB as alternative,

Why is Solr so much faster than Postgres?

阅读更多关于 Why is Solr so much faster than Postgres?

问题 I recently switched from Postgres to Solr and saw a ~50x speed up in our queries. The queries we run involve multiple ranges, and our data is vehicle listings. For example: "Find all vehicles with mileage < 50,000, $5,000 < price < $10,000, make=Mazda..." I created indices on all the relevant columns in Postgres, so it should be a pretty fair comparison. Looking at the query plan in Postgres though it was still just using a single index and then scanning (I assume because it couldn't make use

Lucene search match any word at phrase

阅读更多关于 Lucene search match any word at phrase

问题 i wanna search a string with lots of words, and retrieves documents that matches with any of them. My indexing method is the folowing: Document document = new Document(); document.add(new TextField("termos", text, Field.Store.YES)); document.add(new TextField("docNumber",fileNumber,Field.Store.YES)); config = new IndexWriterConfig(analyzer); Analyzer analyzer = CustomAnalyzer.builder() .withTokenizer("standard") .addTokenFilter("lowercase") .addTokenFilter("stop") .addTokenFilter("porterstem"

Duke Fast Deduplication: java.lang.UnsupportedOperationException: Operation not yet supported?

阅读更多关于 Duke Fast Deduplication: java.lang.UnsupportedOperationException: Operation not yet supported?

问题 I'm trying to use the Duke Fast Deduplication Engine to search for some duplicate records in the database at the company where I work. I run it from the command line like this: java -cp "C:\utils\duke-0.6\duke-0.6.jar;C:\utils\duke-0.6\lucene-core-3.6.1.jar" no.priv.garshol.duke.Duke --showmatches --verbose .\config.xml But I get an error: Exception in thread "main" java.lang.UnsupportedOperationException: Operation no t yet supported at sun.jdbc.odbc.JdbcOdbcResultSet.isClosed(Unknown Source

hibernate search without database

阅读更多关于 hibernate search without database

问题 Is it possible to use hibernate-search only for it's annotations (bean => document/document => bean mapping), without using a database at all? If so, are there any online samples that show basically how to set this up? I found the following: http://mojodna.net/2006/10/02/searchable-annotation-driven-indexing-and-searching-with-lucene.html, but I'd prefer hibernate-search if it supports my use case. 回答1: I don't think that's possible because when you enable Hibernate search you are enabling