analyzer | 易学教程

custom analyzer which breaks the tokens on special characters and lowercase/uppercase

阅读更多关于 custom analyzer which breaks the tokens on special characters and lowercase/uppercase

问题 I am trying to write a custom analyzer which breaks the token on special characters and convert it into uppercase before indexing and I should be able to get result if I search with lowercase also.. for example if I am giving data@source - it should replace @ with whitespace - any special character it should replace with whitespace and give me result like data source. Here is how I tried implementing. PUT sound { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer":

How to add a custom code analyzer to a project without nuget or VSIX?

阅读更多关于 How to add a custom code analyzer to a project without nuget or VSIX?

I want to write a custom code analyzer in Visual Studio 2015 for a C# ConsoleApplication. For this reason I don't want to create a seperate "Analyzer with Code Fix" project from template, because this requires to add this analyzer in my projects as nuget package. Is it possible, to add a analyzer reference manually? I would like to reference the analyzer without nuget. If you add an analyzer as Nuget and check the content of your project, you'll see that only an <Analyzer Include="..." /> item is added. You can do the same manually. Also, you can do this in the .csproj.user file as well, so

Elasticsearch synonym analyzer not working

阅读更多关于 Elasticsearch synonym analyzer not working

EDIT: To add on to this, the synonyms seem to be working with basic querystring queries. "query_string" : { "default_field" : "location.region.name.raw", "query" : "nh" } This returns all of the results for New Hampshire, but a "match" query for "nh" returns no results. I'm trying to add synonyms to my location fields in my Elastic index, so that if I do a location search for "Mass," "Ma," or "Massachusetts" I'll get the same results each time. I added the synonyms filter to my settings and changed the mapping for locations. Here are my settings: analysis":{ "analyzer":{ "synonyms":{ "filter":

Lucene in Neo4j has some misbehaviours in terms of reliable search querys - compared to OrientDB

阅读更多关于 Lucene in Neo4j has some misbehaviours in terms of reliable search querys - compared to OrientDB

I'm still in the evaluation of Neo4j vs. OrientDB . Most importantly I need Lucene as full-text index engine. So I created on both databases the same schema with the same data (300Mio lines). I'm also experienced with querying different things in both systems. I used the Standard Analyzer on both sides. The OrientDB test query results are all fine and really good in terms of reliability and speed. The speed of Neo4j is also ok but the results are kind of bad in most of the cases. So let's come to the different issues I have with Neo4j Lucene indexing. I always give you an example of how it

Elasticsearch count terms ignoring spaces

阅读更多关于 Elasticsearch count terms ignoring spaces

Using ES 1.2.1 My aggregation { "size": 0, "aggs": { "cities": { "terms": { "field": "city","size": 300000 } } } } The issue is that some city names have spaces in them and aggregate separately. For instance Los Angeles { "key": "Los", "doc_count": 2230 }, { "key": "Angeles", "doc_count": 2230 }, I assume it has to do with the analyzer? Which one would I use to not split on spaces? For fields that you want to perform aggregations on I would recommend either the keyword analyzer or do not analyze the field at all. From the keyword analyzer documentation: An analyzer of type keyword that

False positive: Undefined or garbage value returned to caller

阅读更多关于 False positive: Undefined or garbage value returned to caller

问题 The following code populates a result using inline assembly: uint64_t Foo::f() { uint64_t result; asm volatile ("vldmia %1, {q0-q1} \n" // q0-1 = *this ⋮ "vstmia %0, {d0} \n" // result = d0 :: "r"(&result), "r"(this) : "q0", "q1"); return result; } The result variable is unconditionally set in the assembly code, but Xcode's Analyzer seems to ignore this (the flow analysis skips straight from the declaration to the return statement) and complains: …/BitBoard.cpp:26:9: Undefined or garbage

how to analyze text in elasticsearch using java api?

阅读更多关于 how to analyze text in elasticsearch using java api?

问题 I use Elasticsearch 1.7.4 and its Java API. Currently, I want to count the top 10 high frequency searching words by user. So I have to record the words of query text which user type to search, and before recording the words I have to analyze the query text. I find the restful way to analyze text as the link says, but I can't find the api in TransportClient. do anyone know how to analyze text in elasticsearch using Java api or some other way rather than requesting the restful api? 回答1: Using

Elasticsearch aggregation turns results to lowercase

阅读更多关于 Elasticsearch aggregation turns results to lowercase

问题 I've been playing with ElasticSearch a little and found an issue when doing aggregations. I have two endpoints, /A and /B . In the first one I have parents for the second one. So, one or many objects in B must belong to one object in A. Therefore, objects in B have an attribute "parentId" with parent index generated by ElasticSearch. I want to filter parents in A by children attributes of B. In order to do it, I first filter children in B by attributes and get its unique parent ids that I'll

Elasticsearch aggregation turns results to lowercase

阅读更多关于 Elasticsearch aggregation turns results to lowercase

I've been playing with ElasticSearch a little and found an issue when doing aggregations. I have two endpoints, /A and /B . In the first one I have parents for the second one. So, one or many objects in B must belong to one object in A. Therefore, objects in B have an attribute "parentId" with parent index generated by ElasticSearch. I want to filter parents in A by children attributes of B. In order to do it, I first filter children in B by attributes and get its unique parent ids that I'll later use to get parents. I send this request: POST http://localhost:9200/test/B/_search { "query": {

Elasticsearch custom analyzer with ngram and without word delimiter on hyphens

阅读更多关于 Elasticsearch custom analyzer with ngram and without word delimiter on hyphens

问题 I am trying to index strings that contain hyphens but do not contain spaces, periods or any other punctuation. I do not want to split up the words based on hyphens, instead I would like to have the hyphens be part of the indexed text. For example, my 6 text strings would be: magazineplayon magazineofhorses online-magazine best-magazine friend-of-magazines magazineplaygames I would like to be able to search these string for the text containing "play" or for the text starting with "magazine" .