analyzed or not_analyzed, what to choose

前端 未结 1 683
长情又很酷
长情又很酷 2020-12-11 02:56

I\'m using only kibana to search ElasticSearch and i have several fields that can only take a few values (worst case, servername, 30 different values).

I do understa

相关标签:
1条回答
  • 2020-12-11 03:09

    I will to try to keep it simple, if you need more clarification just let me know and I'll elaborate a better answer.

    the "analyzed" field is going to create a token using the analyzer that you had defined for that specific table in your mapping. if you are using the default analyzer (as you refer to something without especial characters lets say server[1-9]) using the default analyzer (alnum-lowercase word-braker(this is not the name just what it does basically)) is going to tokenize :

    this -> HelloWorld123
    into -> token1:helloworld123
    
    OR
    
    this -> Hello World 123
    into -> token1:hello && token2:world && token3:123
    

    in this case if you do a search: HeLlO it will become -> "hello" and it will match this document because the token "hello" is there.

    in the case of not_analized fields it doesnt apply any tokenizer at all, your token is your keyword so that being said:

    this -> Hello World 123
    into -> token1:(Hello World 123)
    

    if you search that field for "hello world 123"

    is not going to match because is "case sensitive" (you can still use wildcards though (Hello*), lets address that in another time).

    in a nutshell:

    use "analyzed" fields for fields that you are going to search and you want elasticsearch to score them. example: titles that contain the word "jobs". query:"title:jobs".

    doc1 : title:developer jobs in montreal
    doc2 : title:java coder jobs in vancuver
    doc3 : title:unix designer jobs in toronto
    doc4 : title:database manager vacancies in montreal
    

    this is going to retrieve title1 title2 title3.

    in those case "analyzed" fields is what you want.

    if you know in advance what kind of data would be on that field and you're going to query exactly what you want then "not_analyzed" is what you want.

    example:

    get all the logs from server123.

    query:"server:server123".

    doc1 :server:server123,log:randomstring,date:01-jan
    doc2 :server:server986,log:randomstring,date:01-jan
    doc3 :server:server777,log:randomstring,date:01-jan
    doc4 :server:server666,log:randomstring,date:01-jan
    doc5 :server:server123,log:randomstring,date:02-jan
    

    results only from server1 and server5.

    and well i hope you get the point. as i said keep it simple is about what you need.

    analyzed -> more space on disk (LOT MORE if the analyze filds are big). analyzed -> more time for indexation. analyzed -> better for matching documents.

    not_analyzed -> less space on disk. not_analyzed -> less time for indexation. not_analyzed -> exact match for fields or using wildcards.

    Regards,

    Daniel

    0 讨论(0)
提交回复
热议问题