lucene

Integrating Lucene Index and Amazon AWS

空扰寡人 提交于 2020-07-23 04:41:24
问题 I have a an existing index of lucene index files and the java code to perform search functions on it. What I would like to do is perform the same thing on a server so users of an app could simply pass a query that will be taken as an input parameter by the java program and run it against the existing index to return the document in which it occurs. All the implementation has been tested on my local pc,but what I need to do is implement it in an Android app. So far I have read around and

Integrating Lucene Index and Amazon AWS

做~自己de王妃 提交于 2020-07-23 04:39:31
问题 I have a an existing index of lucene index files and the java code to perform search functions on it. What I would like to do is perform the same thing on a server so users of an app could simply pass a query that will be taken as an input parameter by the java program and run it against the existing index to return the document in which it occurs. All the implementation has been tested on my local pc,but what I need to do is implement it in an Android app. So far I have read around and

Solr: Scores As Percentages

瘦欲@ 提交于 2020-07-21 05:20:07
问题 First of all, I already saw the lucene doc which tells us to not produce score as percentages: People frequently want to compute a "Percentage" from Lucene scores to determine what is a "100% perfect" match vs a "50%" match. This is also somethings called a "normalized score" Don't do this. Seriously. Stop trying to think about your problem this way, it's not going to end well. Because of these recommandations, I used another way to solve my problem. However , there are a few points of lucene

Change the search order based on word index

南楼画角 提交于 2020-07-09 12:20:34
问题 Is there any way to add weight to the terms found at the beginning of the document? For e.g. I have 3 documents. Medicine XXX Sulpher This medicine contains sulpher and should be taken only after consultation with your doctor. Medicine YYY contains: sulpher Not recommended by most physicians Medicine ZZZ This medicine works like sulpher but does not contain sulpher at all. The document XXX should be listed at the top for the search term "Sulpher" because that is the first word in that

Different analyzers for each field

流过昼夜 提交于 2020-07-04 08:28:26
问题 How can I enable different analyzers for each field in a document I'm indexing with Lucene? Example: RAMDirectory dir = new RAMDirectory(); IndexWriter iw = new IndexWriter(dir, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_CURRENT), true, IndexWriter.MaxFieldLength.UNLIMITED); Document doc = new Document(); Field field1 = new Field("field1", someText1, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS); Field field2 = new Field("field2", someText2, Field

Winwdows solr post - Invalid UTF-8 middle byte 0xe3 (at char #10, byte #-1)

£可爱£侵袭症+ 提交于 2020-05-17 08:47:05
问题 My code c2020 is running and available what I visit http://localhost:8983/solr/#/c2020/query . Locally, when I try to run: solr-7.7.2> java -jar -Dc=c2020 example\exampledocs\post.jar "C:\temp\path_to\a_doc.pdf" I get: SimplePostTool version 5.0.0 Posting files to [base] url http://localhost:8983/solr/c2020/update using content-type application/xml... POSTing file A Half Century of Macro Momentum_vf.pdf to [base] SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url: http

Winwdows solr post - Invalid UTF-8 middle byte 0xe3 (at char #10, byte #-1)

两盒软妹~` 提交于 2020-05-17 08:47:01
问题 My code c2020 is running and available what I visit http://localhost:8983/solr/#/c2020/query . Locally, when I try to run: solr-7.7.2> java -jar -Dc=c2020 example\exampledocs\post.jar "C:\temp\path_to\a_doc.pdf" I get: SimplePostTool version 5.0.0 Posting files to [base] url http://localhost:8983/solr/c2020/update using content-type application/xml... POSTing file A Half Century of Macro Momentum_vf.pdf to [base] SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url: http

SolrCloud error loading solr.VelocityResponseWriter

时光总嘲笑我的痴心妄想 提交于 2020-05-16 22:01:55
问题 I am getting this error in the logs. I did not configure any solr.VelocityResponseWriter in solrConfig. ERROR:[{ "update-queryresponsewriter":{ "startup":"lazy", "name":"velocity", "class":"solr.VelocityResponseWriter", "template.base.dir":"", "solr.resource.loader.enabled":"true", "params.resource.loader.enabled":"true"}, "errorMessages":["Error loading class 'solr.VelocityResponseWriter'"]}] I am using Solr version 8.4 as SolrCloud 来源: https://stackoverflow.com/questions/61346386/solrcloud

What is the best way to index documents which contain mathematical expression in elastic search?

£可爱£侵袭症+ 提交于 2020-05-13 14:19:14
问题 The problem here I am trying to solve is I have a bunch of documents which context mathematical expressions/formulas. I want to search the documents by the formula or expression. So far based on my research I'm considering to convert the mathematical expression to latex format and store as a string in the database (elastic search). With this approach will be I able to search for documents with the latex string? Example latex conversion of a2 + b2 = c2 is a^{2} + b^{2} = c^{2} . Can this

wwsearch 全文检索引擎

那年仲夏 提交于 2020-05-08 16:35:33
地址: https://github.com/Tencent/wwsearch/blob/master/doc/wwsearch-implement.md 背景 企业微信作为典型企业服务系统,其众多企业级应用都需要全文检索能力,包括员工通讯录、企业邮箱、审批、汇报、企 业CRM、企业素材、互联圈子等。下图是一个典型的邮件检索场景。 由于过去几年业务发展迅速,后台检索架构面临挑战: 1. 系统在亿级用户,xxx万企业下,如何高效+实时地检索个人企业内数据和所在企业全局数据。 2. 业务模型众多,如何满足检索条件/功能多样化需求。 3. 数据量庞大,检索文本几十TB,如何节约成本。 业界有被广泛使用的开源全文检索引擎,比如:lucene、sphinx等。它们适用于站内检索的场景。而在海量用户、大规模数据量的实时检索场景下,存在明显缺点: 1. 无法支持细粒度切分索引,只能对全局数据构建索引 ,检索过程需要过滤冗余数据。 2. 不支持实时检索,有几十秒~分钟级延迟。 3. 实际部署机型要求高,需要大内存机型才能支撑T级别的数据存储。 针对已有方案的不足,并结合企业级应用场景,我们重新设计和实现一套通用的全文检索引擎wwsearch。 自研全文检索引擎 wwsearch为海量用户下的全文快速检索而设计,底层支持可插拔的lsm tree存储引擎,具备支持按用户的亿级分表、低延时、高效更新