Is it possible in ElasticSearch to form a query that would preserve the ordering of the terms?
A simple example would be having these documents indexed using standar
Phrase matching doesn't ensure order ;-). If you specify enough slopes -like 2, for example - "hello world" will match "world hello". But this is not necessarily a bad thing because usually searches are more relevant if two terms are "close" to each other and it doesn't matter their order. And I don't think authors of this feature thought of matching words that are 1000 slops apart.
There is a solution that I could find to keep the order, not simple though: using scripts. Here's one example:
POST /my_index/my_type/_bulk
{ "index": { "_id": 1 }}
{ "title": "hello world" }
{ "index": { "_id": 2 }}
{ "title": "world hello" }
{ "index": { "_id": 3 }}
{ "title": "hello term1 term2 term3 term4 world" }
POST my_index/_search
{
"query": {
"filtered": {
"query": {
"match": {
"title": {
"query": "hello world",
"slop": 5,
"type": "phrase"
}
}
},
"filter": {
"script": {
"script": "term1Pos=0;term2Pos=0;term1Info = _index['title'].get('hello',_POSITIONS);term2Info = _index['title'].get('world',_POSITIONS); for(pos in term1Info){term1Pos=pos.position;}; for(pos in term2Info){term2Pos=pos.position;}; return term1Pos
To make the script itself more readable, I am rewriting here with indentations:
term1Pos = 0;
term2Pos = 0;
term1Info = _index['title'].get('hello',_POSITIONS);
term2Info = _index['title'].get('world',_POSITIONS);
for(pos in term1Info) {
term1Pos = pos.position;
};
for(pos in term2Info) {
term2Pos = pos.position;
};
return term1Pos < term2Pos;
Above is a query that searches for "hello world" with a slop of 5 which in the docs above will match all of them. But the scripted filter will ensure that the position in document of word "hello" is lower than the position in document for word "world". In this way, no matter how many slops we set in the query, the fact that the positions are one after the other ensures the order.
This is the section in the documentation that sheds some light on the things used in the script above.