What is the best way to index documents which contain mathematical expression in elastic search?

£可爱£侵袭症+ 提交于 2020-05-13 14:19:14

问题


The problem here I am trying to solve is I have a bunch of documents which context mathematical expressions/formulas. I want to search the documents by the formula or expression.

So far based on my research I'm considering to convert the mathematical expression to latex format and store as a string in the database (elastic search).

With this approach will be I able to search for documents with the latex string?

Example latex conversion of a2 + b2 = c2 is a^{2} + b^{2} = c^{2} . Can this string be searchable in elastic search ?


回答1:


I agree with user @Lue E with some more modifications and tried with a simple keyword approach but gave me some issues, hence I modified my approach to using the keyword tokenizer in my own custom analyzer which should solve most of your use-cases.

Index def with a custom analyzer

{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_custom_analyzer": {
                    "type": "custom",
                    "tokenizer": "keyword", --> to make it searchable
                    "filter": [
                        "lowercase", --> case insensitive search
                        "trim" --> remove extra spaces
                    ]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "mathformula": {
                "type": "text",
                "analyzer": "my_custom_analyzer"
            }
        }
    }
}

Index sample docs

 {
        "mathformula" : "(a+b)^2 = a^2 + b^2 + 2ab"
    }

{
    "mathformula" : "a2+b2 = c2"
}

Search query(match query, uses the same analyzer of the index time)

{
    "query": {
        "match" : {
            "mathformula" : {
                "query" : "a2+b2 = c2"
            }
        }
    }
}

The search result contains only first indexed doc

 "hits": [
            {
                "_index": "so_math",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.6931471,
                "_source": {
                    "mathformula": "a2+b2 = c2"
                }
            }
        ]


来源:https://stackoverflow.com/questions/60960265/what-is-the-best-way-to-index-documents-which-contain-mathematical-expression-in

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!