analyzer

Querying lucene tokens without indexing

依然范特西╮ 提交于 2019-12-10 15:33:51
问题 I am using Lucene (or more specifically Compass), to log threads in a forum and I need a way to extract the keywords behind the discussion. That said, I don't want to index every entry someone makes, but rather I'd have a list of 'keywords' that are relevant to a certain context and if the entry matches a keyword and is above a threshold I'd add these entries to the index. I want to be able to use the power of an analyser to strip out things and do its magic, but then return the tokens from

How to analyse Websphere core*.dmp file and Snap*.trc files?

社会主义新天地 提交于 2019-12-10 04:22:27
问题 All, I have my application running on websphere app server 7.0. I get some of these core dumps and trace files like core.20110909.164930.3828.0001.dmp and Snap.20110909.164930.3828.0003.trc. My question is, just like the thread dumps generated by WAS can be opened and analyzed by IBM-Thread Dump Analyzer tool is there a tool(s) to open the above mentioned files by IBM or any other? Thanks, Ayusman 回答1: the core dumps have to be processed by the jextract utility (of the jre that dumped) fromn

How to modify standard analyzer to include #?

不羁的心 提交于 2019-12-08 08:29:59
问题 Some characters are treated as delimiters like #, so they would never match in the query. What should be the custom analyzer configuration closest to standard to allow these characters to be matched ? 回答1: 1) Simplest way would be to use whitespace tokenizer with lowercase filter. curl -XGET 'localhost:9200/_analyze?tokenizer=whitespace&filters=lowercase&pretty' -d 'new year #celebration vegas' which would give you { "tokens" : [ { "token" : "new", "start_offset" : 0, "end_offset" : 3, "type"

Customizing Analyzers in Solr

可紊 提交于 2019-12-07 10:02:45
问题 In Solr I have a custom Analyzer that has two parameters. I know how to specify this Analyzer in the schema.xml but I'm wondering how I can pass the two arguments either in the schema.xml or runtime in the code. 回答1: You can not pass parameters to the schema xml at run-time , as far as I know. But you can use the reload command.This can be useful when (backwards compatible) changes have been made to your solrconfig.xml or schema.xml files (e.g. new declarations, changed default params for a ,

Why does Lucene QueryParser needs an Analyzer

北城余情 提交于 2019-12-06 23:51:53
问题 I'm new to Lucene and trying to parse a raw string into a Query using the QueryParser . I was wondering, why is the QueryParser.Parse() method needs an Analyzer parameter at all? If analyzing is something that has to do with querying, then an Analyzer should be specified when dealing with regular Query objects as well ( TermQuery , BooleanQuery etc), and if not, why is QueryParser requires it? 回答1: When indexing, Lucene divides the text into atomic units (tokens). During this phase many

How to add analyzer settings in ElasticSearch?

南笙酒味 提交于 2019-12-06 03:49:13
I am using ElasticSearch 1.5.2 and I wish to have the following settings : "settings": { "analysis": { "filter": { "filter_shingle": { "type": "shingle", "max_shingle_size": 2, "min_shingle_size": 2, "output_unigrams": false }, "filter_stemmer": { "type": "porter_stem", "language": "English" } }, "tokenizer": { "my_ngram_tokenizer": { "type": "nGram", "min_gram": 1, "max_gram": 1 } }, "analyzer": { "ShingleAnalyzer": { "tokenizer": "my_ngram_tokenizer", "filter": [ "standard", "lowercase", "filter_stemmer", "filter_shingle" ] } } } } Where should I add them? I mean before index creation or

Elasticsearch synonym analyzer not working

回眸只為那壹抹淺笑 提交于 2019-12-05 19:56:06
问题 EDIT: To add on to this, the synonyms seem to be working with basic querystring queries. "query_string" : { "default_field" : "location.region.name.raw", "query" : "nh" } This returns all of the results for New Hampshire, but a "match" query for "nh" returns no results. I'm trying to add synonyms to my location fields in my Elastic index, so that if I do a location search for "Mass," "Ma," or "Massachusetts" I'll get the same results each time. I added the synonyms filter to my settings and

Obj-, Instance variable used when 'self' is not set to the result of '[(super or self) init…]'

﹥>﹥吖頭↗ 提交于 2019-12-05 12:21:23
I asked a similar question to this already, but I still can't see the problem? -(id)initWithKeyPadType: (int)value { [self setKeyPadType:value]; self = [self init]; if( self != nil ) { //self.intKeyPadType = value; } return self; } - (id)init { NSNumberFormatter *formatter = [[[NSNumberFormatter alloc] init] autorelease]; decimalSymbol = [formatter decimalSeparator]; .... The warning comes from the line above Instance variable used while 'self' is not set to the result of '[(super or self) init...]' What you are trying to do is technically OK, but at some stage you need to invoke [super init]

Elasticsearch count terms ignoring spaces

牧云@^-^@ 提交于 2019-12-04 09:42:39
问题 Using ES 1.2.1 My aggregation { "size": 0, "aggs": { "cities": { "terms": { "field": "city","size": 300000 } } } } The issue is that some city names have spaces in them and aggregate separately. For instance Los Angeles { "key": "Los", "doc_count": 2230 }, { "key": "Angeles", "doc_count": 2230 }, I assume it has to do with the analyzer? Which one would I use to not split on spaces? 回答1: For fields that you want to perform aggregations on I would recommend either the keyword analyzer or do not

what is the best lucene setup for ranking exact matches as the highest

≡放荡痞女 提交于 2019-12-04 06:35:10
Which analyzers should be used for indexing and for searching when I want an exact match to rank higher then a "partial" match? Possibly set up custom scoring in a Similarity class? For example, when my index consist of car parts , car , and car shop (indexed with StandardAnalyzer on lucene 3.5), a query for "car" results in: car parts car car shop (basically returned in the order in which they were added, since they all get the same score). What I would like to see is car ranked first, then the other results (doesn't really matter which order, I assume the analyzer can influence that). All