solr exact search ignore duplicate phrase

你说的曾经没有我的故事 提交于 2019-12-25 16:42:24

问题


I'm using solr query to search the keyword from documents. I want exact Phrase to come on top but i also want if same phrase is repeated many times in document then it should be counted one because those keywords having same Phrase multiple times in document coming on top becauase getting high scoring.

Please see the result below given i am searching for "php developer", two results found but both have the different scores.

As per our need both should have the same score. I want to ignore the repeat phrase found in documents.

Please check schema filed also, searching "job_search" field combination of "job_title,key_skills,key_skills_admin,job_detail"

        <copyField source="job_title" dest="job_search"/>
        <copyField source="key_skills" dest="job_search"/>
        <copyField source="key_skills_admin" dest="job_search"/>   
        <copyField source="job_detail" dest="job_search"/> 

        {
        "responseHeader":{
        "status":0,
        "QTime":7,
        "params":{
          "lowercaseOperators":"true",
          "mm":"2",
          "debugQuery":"true",
          "fl":"job_slno,job_title,job_detail,key_skills,key_skills_admin,display_date,score",
          "indent":"true",
          "q":"\"php developer\"",
          "stopwords":"true",
          "wt":"json",
          "defType":"edismax"}},
        "response":{"numFound":110,"start":0,"maxScore":2.518858,"docs":[
          {
            "job_slno":"243681",
            "job_title":"php developer",
            "job_detail":"sdf sdfs df",
            "key_skills":"php developer",
            "key_skills_admin":"php developer",
            "display_date":"2016-11-11T00:00:00Z",
            "score":2.518858},
          {
            "job_slno":"243340",
            "job_title":"sfsdfs",
            "job_detail":"dfsdfsdfsd",
            "key_skills":"PHP Developer",
            "key_skills_admin":"PHP Developer",
            "display_date":"2016-11-13T00:00:00Z",
            "score":2.399412},
          ]
        }

回答1:


As long as you're not dependent on the position of the tokens (as in you're not doign phrase boosting or something similar), you can set omitTermFreqAndPositions to true for the field.

That will avoid storing any information about the term frequency and inherently make the score identical as long as the term frequency is the only differing factor.




回答2:


You can create your own custom Similarity class extending DefaultSimilarity. And override the tf method as per your use case.

public class CustomSimilarity extends DefaultSimilarity {

        //multiple occurrences of terms doesn't affect its relevancy
        @Override
        public float tf(float freq) {
                return 1;
        }
}


来源:https://stackoverflow.com/questions/41972082/solr-exact-search-ignore-duplicate-phrase

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!