lucene

How to make sure Solr/Lucene won't die with java.lang.OutOfMemoryError?

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-20 10:30:01
问题 I'm really puzzled why it keeps dying with java.lang.OutOfMemoryError during indexing even though it has a few GBs of memory. Is there a fundamental reason why it needs manual tweaking of config files / jvm parameters instead of it just figuring out how much memory is available and limiting itself to that? No other programs except Solr ever have this kind of problem. Yes, I can keep tweaking JVM heap size every time such crashes happen, but this is all so backwards. Here's stack trace of the

What are segments in Lucene?

自闭症网瘾萝莉.ら 提交于 2019-12-20 09:55:10
问题 What are segments in Lucene ? What are the benefits of segments? 回答1: The Lucene index is split into smaller chunks called segments. Each segment is its own index. Lucene searches all of them in sequence. A new segment is created when a new writer is opened and when a writer commits or is closed. The advantages of using this system are that you never have to modify the files of a segment once it is created. When you are adding new documents in your index, they are added to the next segment.

How to fix: Error CREATEing SolrCore 'gettingstarted': Unable to create core

ぃ、小莉子 提交于 2019-12-20 09:53:45
问题 I'm getting this error when I try to create a new core in solr. root@ubuntu:/opt/solr# bin/solr create -c gettingstarted -n data_driven_schema_configs Setup new core instance directory: /var/solr/data/gettingstarted Creating new core 'gettingstarted' using command: http://localhost:8983/solr/admin/cores?action=CREATE&name=gettingstarted&instanceDir=gettingstarted Failed to create core 'gettingstarted' due to: Error CREATEing SolrCore 'gettingstarted': Unable to create core [gettingstarted]

lucene语法

巧了我就是萌 提交于 2019-12-20 09:46:35
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 字段 也可以按页面左侧显示的字段搜索 限定字段全文搜索: field:value 精确搜索:关键字加上双引号 filed:"value" http.code:404 搜索http状态码为404的文档 字段本身是否存在 _exists_:http :返回结果中需要有http字段 _missing_:http :不能含有http字段 通配符 ? 匹配单个字符 * 匹配0到多个字符 kiba?a , el*search ? * 不能用作第一个字符,例如: ?text *text 正则 es支持部分 正则 功能,性能较差 name:/joh?n(ath[oa]n)/ 模糊搜索 quikc~ brwn~ foks~ ~ :在一个单词后面加上 ~ 启用模糊搜索,可以搜到一些拼写错误的单词 first~ 这种也能匹配到 frist 还可以设置编辑距离(整数),指定需要多少相似度 cromm~1 会匹配到 from 和 chrome 默认2,越大越接近搜索的原始值,设置为1基本能搜到80%拼写错误的单词 近似搜索 在短语后面加上 ~ ,可以搜到被隔开或顺序不同的单词 "where select"~5 表示 select 和 where 中间可以隔着5个单词,可以搜到 select password from users

Extract tf-idf vectors with lucene

心已入冬 提交于 2019-12-20 08:39:13
问题 I have indexed a set of documents using lucene. I also have stored DocumentTermVector for each document content. I wrote a program and got the term frequency vector for each document, but how can I get tf-idf vector of each document? Here is my code that outputs term frequencies in each document: Directory dir = FSDirectory.open(new File(indexDir)); IndexReader ir = IndexReader.open(dir); for (int docNum=0; docNum<ir.numDocs(); docNum++) { System.out.println(ir.document(docNum).getField(

Is MongoDB a valid alternative to relational db + lucene?

ぃ、小莉子 提交于 2019-12-20 08:38:42
问题 On a new project I need a hard use of lucene for a searcher implementation. This searcher will be a very important (and big) piece of the project. Is valid or convenient replacing Relational Database + Lucene with MongoDb? edit: Ok, I will clarify: I'm not asking about risk, I can pay that price in this project. My point is: Is MongoDB oriented to this kind of thing? Can I make a full search engine with the same perfomance as I can get on Lucene?. A friend point me out MongoDB as alternative,

Why is Solr so much faster than Postgres?

社会主义新天地 提交于 2019-12-20 07:58:12
问题 I recently switched from Postgres to Solr and saw a ~50x speed up in our queries. The queries we run involve multiple ranges, and our data is vehicle listings. For example: "Find all vehicles with mileage < 50,000, $5,000 < price < $10,000, make=Mazda..." I created indices on all the relevant columns in Postgres, so it should be a pretty fair comparison. Looking at the query plan in Postgres though it was still just using a single index and then scanning (I assume because it couldn't make use

Lucene search match any word at phrase

放肆的年华 提交于 2019-12-20 06:48:32
问题 i wanna search a string with lots of words, and retrieves documents that matches with any of them. My indexing method is the folowing: Document document = new Document(); document.add(new TextField("termos", text, Field.Store.YES)); document.add(new TextField("docNumber",fileNumber,Field.Store.YES)); config = new IndexWriterConfig(analyzer); Analyzer analyzer = CustomAnalyzer.builder() .withTokenizer("standard") .addTokenFilter("lowercase") .addTokenFilter("stop") .addTokenFilter("porterstem"

Duke Fast Deduplication: java.lang.UnsupportedOperationException: Operation not yet supported?

匆匆过客 提交于 2019-12-20 04:35:11
问题 I'm trying to use the Duke Fast Deduplication Engine to search for some duplicate records in the database at the company where I work. I run it from the command line like this: java -cp "C:\utils\duke-0.6\duke-0.6.jar;C:\utils\duke-0.6\lucene-core-3.6.1.jar" no.priv.garshol.duke.Duke --showmatches --verbose .\config.xml But I get an error: Exception in thread "main" java.lang.UnsupportedOperationException: Operation no t yet supported at sun.jdbc.odbc.JdbcOdbcResultSet.isClosed(Unknown Source

hibernate search without database

喜欢而已 提交于 2019-12-20 04:20:03
问题 Is it possible to use hibernate-search only for it's annotations (bean => document/document => bean mapping), without using a database at all? If so, are there any online samples that show basically how to set this up? I found the following: http://mojodna.net/2006/10/02/searchable-annotation-driven-indexing-and-searching-with-lucene.html, but I'd prefer hibernate-search if it supports my use case. 回答1: I don't think that's possible because when you enable Hibernate search you are enabling