indexing and full text searching in elasticsearch without dialitics using c# client Nest

家住魔仙堡 提交于 2019-12-30 12:02:12

问题


I'm preparing an in-site search engine with elasticsearch and I'm new to elasticsearch. Sites which will use this engine are Turkish / English.

In Turkey, we have Turkish letters like 'ğ', 'ü', 'ş', 'ı', 'ö', 'ç'. But when we search generally we use the letters 'g', 'u', 's', 'i', 'o', 'c'. This is not a rule but we generally do it, think like a habit, something we used to.

Now, I have a document type called "product" and this type has several string properties and some are nested. For example:

public class Product {
    public string ProductName { get; set; }
    public Category Category { get; set; }
    //...
}
public class Category {
    public string CategoryName { get; set; }
    //...
}

My goal is this:

  • ProductName or Category.CategoryName may contain Turkish letters ("Eşarp") or some may be mistyped and written with English letters ("Esarp")
  • Querystring may contain Turkish letters ("eşarp") or not ("esarp")
  • Querystring may have multiple words
  • Every indexed string field should be searched against querystring (full-text search)

Now, what I did:

  • While creating index, I also configure mappings and used a custom analyzer called "sanalyze" which uses "lowercase" and "asciifolding" filters and standard tokenizer instead of standard analyzer.
  • Used that custom analyzer for string fields mappings.

Example code for mapping:

// some more mappings which uses the same mapping for all string fields.
.Map<Yaziylabir.Extensions.TagManagement.Models.TagModel>(m => m.AutoMap().Properties(p => p
    .String(s => s
        .Name(n => n.Tag).Analyzer("sanalyze")))))
.Settings(s => s
    .Analysis(ans => ans
        .Analyzers(anl => anl
            .Custom("sanalyze", c => c
                .Tokenizer("standard")
                .Filters("lowercase", "asciifolding")))));
  • I deleted, recreated and indexed my index
  • Now I'm trying to search in that index.

I tried with two different query to search against stored documents:

q &= Query<ProductModel>.QueryString(t => t.Query(Keyword).Analyzer("sanalyze"));

q &= Query<ProductModel>.QueryString(t => t.Query(Keyword));

The second doesn't use Analyzer method because in elasticsearch documentation, it says that elasticsearch will use the analyzer used on a field. So I think there is no need to define it again while searching.

What I got as result:

  • First query (with Analyzer("sanalyze")): When I search "eşarp" or "esarp", No results. When I search "bordo", I got results.
  • Second query (without analyzer("sanalyze")): When I search "eşarp", I got results. When I search "esarp", No results. When I search "bordo", I got results.

BTW:

  • Documents contain "Eşarp" as ProductName value and when I checked elasticsearch created "esarp" field term.

  • Documents contain "Bordo" as value and "bordo" as field term.

I couldn't achive what I want. What do I do wrong? - Should I use another filter instead of asciifolding? - Should I use preserveOriginal with asciifolding? I don't want to use that option to not to screw scores. - Something different to do?

Can you please help me?

If you think it is not clear what I'm asking, please tell me, I will try to make it clearer.

Thank you.


回答1:


Using the default settings for query_string means you are searching in the _all field. The _all field has its own analyzer - the standard one.

You need to specify on which field you want query_string to act on:

  "query": {
    "query_string": {
      "query": "your_field_name:esarp"
    }
  }

or

  "query": {
    "query_string": {
      "query": "esarp",
      "default_field": "your_field_name"
    }
  }


来源:https://stackoverflow.com/questions/37524988/indexing-and-full-text-searching-in-elasticsearch-without-dialitics-using-c-shar

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!