analyzer

How to not-analyze in ElasticSearch?

我们两清 提交于 2019-11-27 07:04:31
I've got a field in an ElasticSearch field which I do not want to have analyzed, i. e. it should be stored and compared verbatim. The values will contain letters, numbers, whitespace, dashes, slashes and maybe other characters. If I do not give an analyzer in my mapping for this field, the default still uses a tokenizer which hacks my verbatim string into chunks of words. I don't want that. Is there a super simple analyzer which, basically, does not analyze? Or is there a different way of denoting that this field shall not be analyzed? I only create the index, I don't do anything else. I can

Configure elasticsearch mapping with java api

纵饮孤独 提交于 2019-11-27 05:55:59
问题 I have a few elasticsearch fields that I don't want to analyze before indexing. I have read that the right way to do this is by altering the index mapping. Right now my mapping looks like this: { "test" : { "general" : { "properties" : { "message" : { "type" : "string" }, "source" : { "type" : "string" } } } } } And I would like it to look like this: { "test" : { "general" : { "properties" : { "message" : { "type" : "string", "index" : "not_analyzed" }, "source" : { "type" : "string" } } } }

Comparison of Lucene Analyzers

半城伤御伤魂 提交于 2019-11-27 04:09:03
问题 Can someone please explain the difference between the different analyzers within Lucene? I am getting a maxClauseCount exception and I understand that I can avoid this by using a KeywordAnalyzer but I don't want to change from the StandardAnalyzer without understanding the issues surrounding analyzers. Thanks very much. 回答1: In general, any analyzer in Lucene is tokenizer + stemmer + stop-words filter. Tokenizer splits your text into chunks, and since different analyzers may use different

How to not-analyze in ElasticSearch?

孤人 提交于 2019-11-27 03:59:09
问题 I've got a field in an ElasticSearch field which I do not want to have analyzed, i. e. it should be stored and compared verbatim. The values will contain letters, numbers, whitespace, dashes, slashes and maybe other characters. If I do not give an analyzer in my mapping for this field, the default still uses a tokenizer which hacks my verbatim string into chunks of words. I don't want that. Is there a super simple analyzer which, basically, does not analyze? Or is there a different way of

Turn Image into Text - Java [duplicate]

放肆的年华 提交于 2019-11-26 22:54:40
问题 This question already has an answer here: Java OCR implementation [closed] 5 answers This is an interesting topic. Basically, I have an image that contains some text. How do I extract the text from the image? I have already tried many things, but everything I do is very tedious and usually does not work. I am simply wondering if there is a fairly easy way to do this. I have come upon this: http://sourceforge.net/projects/javaocr/. I have tried this for hours, but I cannot get it to take an

Hibernate Search | ngram analyzer with minGramSize 1

强颜欢笑 提交于 2019-11-26 22:28:32
问题 I have some problems with my Hibernate Search analyzer configuration. One of my indexed entities ("Hospital") has a String field ("name") that could contain values with lengths from 1-40. I want to be able to find a entity by searching for just one character (because it could be possible, that a hospital has single character name). @Indexed(index = "HospitalIndex") @AnalyzerDef(name = "ngram", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef

Deep copy of dictionaries gives Analyze error in Xcode 4.2

馋奶兔 提交于 2019-11-26 20:57:13
I have the following method in a NSDictionary category, to do a deep copy, which works fine. I just upgraded from Xcode 4.1 to 4.2, and the Analyze function gives two analyzer warnings for this code, as indicated: - (id)deepCopy; { id dict = [[NSMutableDictionary alloc] init]; id copy; for (id key in self) { id object = [self objectForKey:key]; if ([object respondsToSelector:@selector(deepCopy)]) copy = [object deepCopy]; else copy = [object copy]; [dict setObject:copy forKey:key]; // Both -deepCopy and -copy retain the object, and so does -setObject:forKey:, so need to -release: [copy release

How to use a Lucene Analyzer to tokenize a String?

﹥>﹥吖頭↗ 提交于 2019-11-26 19:17:55
问题 Is there a simple way I could use any subclass of Lucene's Analyzer to parse/tokenize a String ? Something like: String to_be_parsed = "car window seven"; Analyzer analyzer = new StandardAnalyzer(...); List<String> tokenized_string = analyzer.analyze(to_be_parsed); 回答1: As far as I know, you have to write the loop yourself. Something like this (taken straight from my source tree): public final class LuceneUtils { public static List<String> parseKeywords(Analyzer analyzer, String field, String

How to wisely combine shingles and edgeNgram to provide flexible full text search?

♀尐吖头ヾ 提交于 2019-11-26 17:46:18
We have an OData-compliant API that delegates some of its full text search needs to an Elasticsearch cluster. Since OData expressions can get quite complex, we decided to simply translate them into their equivalent Lucene query syntax and feed it into a query_string query. We do support some text-related OData filter expressions, such as: startswith(field,'bla') endswith(field,'bla') substringof('bla',field) name eq 'bla' The fields we're matching against can be analyzed , not_analyzed or both (i.e. via a multi-field). The searched text can be a single token (e.g. table ), only a part thereof

Deep copy of dictionaries gives Analyze error in Xcode 4.2

让人想犯罪 __ 提交于 2019-11-26 07:47:46
问题 I have the following method in a NSDictionary category, to do a deep copy, which works fine. I just upgraded from Xcode 4.1 to 4.2, and the Analyze function gives two analyzer warnings for this code, as indicated: - (id)deepCopy; { id dict = [[NSMutableDictionary alloc] init]; id copy; for (id key in self) { id object = [self objectForKey:key]; if ([object respondsToSelector:@selector(deepCopy)]) copy = [object deepCopy]; else copy = [object copy]; [dict setObject:copy forKey:key]; // Both