lucene | 易学教程

Lucene in Android

阅读更多关于 Lucene in Android

问题 I'm new to android and Lucene . can I use Lucene for search in android list view . I have tried importing the package 2.3.2 and also used the jar files in library. However, there is an error in SearchFiles.java error is : The type java.rmi.Remote cannot be resolved. It is indirectly referenced from .class files. There is a possibility that this file doesnt exist for android. Is this the problem? 回答1: You may want to use the native Full Text Search feature called FTS3 in SQLite instead, which

Using StandardTokenizerFactory with currency

阅读更多关于 Using StandardTokenizerFactory with currency

问题 The fieldType config descrived in this question works for me to detect currency (eg. docs containing "$30" ). However, we wish to use the StandardTokenizerFactory, rather than the WhiteSpaceTokenizerFactory - and this config returns false positives with the StandardTokenizerFactory (eg. docs containing the digits 30 without the $ symbol). What is the solution? Thanks How do I find documents containing digits and dollar signs in Solr? 回答1: Solved via a post to the solr user group http://lucene

Is it possible to generate elasticsearch reports from indexed content?

阅读更多关于 Is it possible to generate elasticsearch reports from indexed content?

问题 I'm just getting used to using elasticsearch in our platform, and so far it's proven to be a superb move, but other than some built in stats I haven't found any reference to creating a report of sorts. I guess the closest comparison would be facets, but it seems they need to be predefined in order to show stats for them. What I would like to know is, is it possible to run reports such as: What are the most popular phrases within the indexed content for the last 24 hours, week, etc.? This

Finding Solr documents that intersect with a defined Radius

阅读更多关于 Finding Solr documents that intersect with a defined Radius

问题 We are using Apache Solr 5.x, and we currently have a bunch of defined shapes. Polygons, Circles, etc. These all correspond to a document, each shape of coordinates does. What I want to know is - is it possible to provide a circle , that is - a (lat,lng) pair along with a Radius for that circle - and then find all documents that have an intersection with that circle? I have tried a variety of options, most recently this one: solr_index_wkt:"IsWithin(CIRCLE((149.39999999999998 -34.92 d=0

SpanFirstQuery not working in lucene

阅读更多关于 SpanFirstQuery not working in lucene

问题 I'm trying to use SpanFirstQuery to match beginning of a field in lucene. But it just doesn't seem to work. here's code i'm using. Map<String, Analyzer> searchAnalyzers = new HashMap<String, Analyzer>(); searchAnalyzers.put(NAME, new KeywordAnalyzer()); searchAnalyzers.put(ORGANIZATION_NAME, new KeywordAnalyzer()); searchAnalyzers.put(ORGANIZATION_POSITION, new KeywordAnalyzer()); PerFieldAnalyzerWrapper perFieldAnalyzerWrapper = new PerFieldAnalyzerWrapper(new KeywordAnalyzer(),

Exception thrown when try to add documents to the lucene index continuously inside the for loop

阅读更多关于 Exception thrown when try to add documents to the lucene index continuously inside the for loop

问题 I’m using “compass-2.2.0” to create a lucene index in MySql database table. This is part of my code to index documents, following Exception thrown when try to add documents to the lucene index continuously inside the for loop. Any workaround to overcome this error? My hosting server is WSo2 Stratoes tomcat based server and Wso2 Stratoes data service server. My program works fime in local tomcat/ mySql servers. This is the sample blog post that I have followed- http://mprabhat.wordpress.com

Build Lucene Synonyms

阅读更多关于 Build Lucene Synonyms

问题 I've the following code static class TaggerAnalyzer extends Analyzer { @Override protected TokenStreamComponents createComponents(String s, Reader reader) { SynonymMap.Builder builder = new SynonymMap.Builder(true); builder.add(new CharsRef("al"), new CharsRef("americanleague"), true); builder.add(new CharsRef("al"), new CharsRef("a.l."), true); builder.add(new CharsRef("nba"), new CharsRef("national" + SynonymMap.WORD_SEPARATOR + "basketball" + SynonymMap.WORD_SEPARATOR + "association"),

Calling Commit on an index that is currently been merged in Lucene

阅读更多关于 Calling Commit on an index that is currently been merged in Lucene

问题 My question is regarded to Lucene .NET 2.9.2 Say I updated an index using IndexWriter , and that caused the scheduler to start merging segments in the backround. What will happen if i'll call Commit before the merge was completed? will the thread that called Commit will be blocked and wait for merge to finish, or the two threads are independent? The answer is very important to my search implementation, since I rely on the FieldCache for performance issues, and if Commit wont wait for the

Elasticsearch内存占用过高

阅读更多关于 Elasticsearch内存占用过高

Elasticsearch默认安装后设置的内存是1GB，对于任何一个现实业务来说，这个设置都太小了。如果你正在使用这个默认堆内存配置，你的集群配置可能会很快发生问题。这里有两种方式修改Elasticsearch的堆内存（下面就说内存好了），最简单的一个方法就是指定ES_HEAP_SIZE环境变量。服务进程在启动时候会读取这个变量，并相应的设置堆的大小。设置命令如下： export ES_HEAP_SIZE=10g 此外，你也可以通过命令行参数的形式，在程序启动的时候把内存大小传递给它： ./bin/elasticsearch -Xmx10g -Xms10g 备注: 确保Xmx和Xms的大小是相同的，其目的是为了能够在java垃圾回收机制清理完堆区后不需要重新分隔计算堆区的大小而浪费资源，可以减轻伸缩堆大小带来的压力。一般来说设置ES_HEAP_SIZE环境变量，比直接写-Xmx10g -Xms10g更好一点。把内存的一半给Lucene 一个常见的问题是配置一个大内存，假设你有一个64G内存的机器，按照正常思维思考，你可能会认为把64G内存都给Elasticsearch比较好，但现实是这样吗，越大越好？当然，内存对于Elasticsearch来说绝对是重要的，用于更多的内存数据提供更快的操作，而且还有一个内存消耗大户-Lucene。

why tokenize texts in lucene?

阅读更多关于 why tokenize texts in lucene?

问题 I'm beginner of lucene. Here's my source: ft = new FieldType(StringField.TYPE_STORED); ft.setTokenized(false); ft.setStored(true); ftNA = new FieldType(StringField.TYPE_STORED); ftNA.setTokenized(true); ftNA.setStored(true); Why tokenized in lucene? For example: the String value of "my name is lee" case tokenized, "my" "name" "is" "lee" case not tokenized, "my name is lee" I'dont understand why indexing by tokenized. What is the difference between tokenized and not tokenized? 回答1: Lucene