How to boost hibernate-search query with field values?

回眸只為那壹抹淺笑 提交于 2020-02-25 05:40:09

问题


I have two fields in an entity class:

  1. establishmentName
  2. contactType

contactType has values like PBX, GSM, TEL and FAX

I want a scoring mechanism as to get the most matching data first then PBX, TEL, GSM and FAX.

Scoring:

  • On establishmentName to get the most matching data first
  • On contactType to get first PBX then TEL and so on

My final query is:

(+establishmentName:kamran~1^2.5 +(contactType:PBX^2.0 contactType:TEL^1.8 contactType:GSM^1.6 contactType:FAX^1.4))

But it not returning the result.

My question is, how to boost a specific field on different values basis ?

We can use the following query for two different fields:

Query query = qb.keyword()
    .onField( field_one).boostedTo(2.0f)
    .andField( field_two)
    .matching( searchTerm)
    .createQuery();

But i need to boost a field on its values as in my case it is contactType.

My dataset:
(establishmentName : Concert Decoration, contactType : GSM), (establishmentName : Elissa Concert, contactType : TEL), (establishmentName : Yara Concert, contactType : FAX), (establishmentName : E Concept, contactType : TEL), (establishmentName : Infinity Concept, contactType : FAX), (establishmentName : SD Concept, contactType : PBX), (establishmentName : Broadcom Technical Concept, contactType : GSM), (establishmentName : Concept Businessmen, contactType : PBX)

By searching the term=concert(fuzzy query on establishmentName), it should return me the list as below: (establishmentName : Elissa Concert, contactType : TEL)

[term=concert, exact matching so it will be on top by keeping the order as PBX, TEL, GSM and FAX]

(establishmentName : Concert Decoration, contactType : GSM)

[term=concert, exact matching and by keeping the order as PBX, TEL, GSM and FAX]

(establishmentName : Yara Concert, contactType : FAX)

[term=concert, exact matching and by keeping the order as PBX, TEL, GSM and FAX]

(establishmentName : Concept Businessmen, contactType : PBX)

[term=concert, partial matching and keeping the order as PBX, TEL, GSM and FAX]

(establishmentName : SD Concept, contactType : PBX)

[term=concert, partial matching and keeping the order as PBX, TEL, GSM and FAX]

(establishmentName : E Concept, contactType : TEL)

[term=concert, partial matching and keeping the order as PBX, TEL, GSM and FAX]

(establishmentName : Broadcom Technical Concept, contactType : GSM)

[term=concert, partial matching and keeping the order as PBX, TEL, GSM and FAX]

(establishmentName : Infinity Concept, contactType : FAX)

[term=concert, partial matching and keeping the order as PBX, TEL, GSM and FAX]


回答1:


From what I understand you basically want a two-phase sort:

  1. Put exact matches before other (fuzzy) matches.
  2. Sort by contact type.

The second sort is trivial, but the first one will require a bit of work. You can actually rely on scoring to implement it.

Essentially the idea would be to run a disjunction of multiple queries, and to assign a constant score to each query.

Instead of doing this:

Query query = qb.keyword()
    .fuzzy().withEditDistanceUpTo(1)
    .boostedTo(2.5f)
    .onField("establishmentName")
    .matching(searchTerm)
    .createQuery();

Do this:

Query query = qb.bool()
    .should(qb.keyword()
        .withConstantScore().boostedTo(100.0f) // Higher score, sort first
        .onField("establishmentName")
        .matching(searchTerm)
        .createQuery())
    .should(qb.keyword()
        .fuzzy().withEditDistanceUpTo(1)
        .withConstantScore().boostedTo(1.0f) // Lower score, sort last
        .onField("establishmentName")
        .matching(searchTerm)
        .createQuery())
    .createQuery();

The matched documents will be the same, but now the query will assign predictable scores: 1.0 for fuzzy-only matches, and 101.0 (1 from the fuzzy query and 100 from the exact query) for exact matches.

This way, you can define the sort as follows:

fullTextQuery.setSort(qb.sort()
    .byScore()
    .andByField("contactType")
    .createSort());

This may not be a very elegant, or optimized solution, but I think it will work.

To customize the relative order of contact types, I would suggest a different approach: use a custom bridge to index numbers instead of the "PBX"/"TEL"/etc., assigning to each contact type the ordinal you expect. Essentially something like that:

public class Establishment {

@Field(name = "contactType_sort", bridge = @FieldBridge(impl = ContactTypeOrdinalBridge.class))
private ContactType contactType;

}

public class ContactTypeOrdinalBridge implements MetadataProvidingFieldBridge {

    @Override
    public void set(String name, Object value, Document document, LuceneOptions luceneOptions) {
        if ( value != null ) {
          int ordinal = getOrdinal((ContactType) value);
          luceneOptions.addNumericFieldToDocument(name, ordinal, document);
          luceneOptions.addNumericDocValuesFieldToDocument(name, ordinal, document);
        }
    }


    @Override
    public void configureFieldMetadata(String name, FieldMetadataBuilder builder) {
        builder.field(name, FieldType.INTEGER).sortable(true);
    }

    private int getOrdinal(ContactType value) {
        switch( value ) {
          case PBX: return 0;
          case TEL: return 1;
          case GSM: return 2;
          case PBX: return 3;
          default: return 4;
        }
    }
}

Then reindex, and sort like this:

fullTextQuery.setSort(qb.sort()
    .byScore()
    .andByField("contactType_sort")
    .createSort());


来源:https://stackoverflow.com/questions/59658935/how-to-boost-hibernate-search-query-with-field-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!