Mongodb - Multiple text index: Index key pattern too large error code 67

匿名 (未验证) 提交于 2019-12-03 00:57:01

问题:

I have the following Mongodb database structure:

{      "_id" : "519817e508a16b447c00020e",      "keyword" : "Just an example query",      "rankings" :      {         results:         {             "1" : { "domain" : "example1.com", "href" : "http://www.example1.com/"},             "2" : { "domain" : "example2.com", "href" : "http://www.example2.com/"},             "3" : { "domain" : "example3.com", "href" : "http://www.example3.com/"},             "4" : { "domain" : "example4.com", "href" : "http://www.example4.com/"},             "5" : { "domain" : "example5.com", "href" : "http://www.example5.com/"},             ...             ...             "99" : { "domain" : "example99.com", "href" : "http://www.example99.com/"}             "100" : {"domain" : "example100.com", "href" : "http://www.example100.com/"}         },          "plus":"many",          "other":"not",          "interesting" : "stuff",          "for": "this question"     } } 

In a previous question, I asked how to index the text so that I could search for the keyword and domain using for example:

db.ranking.find({ $text: { $search: "\"example9.com\" \"Just an example query\""}})   

The awesome answer by John Petrone was:

db.ranking.ensureIndex( {     "keyword": "text",     "rankings.results.1.domain" : "text",     "rankings.results.2.domain" : "text",     ...     ...     "rankings.results.99.domain" : "text",     "rankings.results.100.domain" : "text" } 

However, if that works just great when I have 10 results, I run into an "Index key pattern too large" error with code 67 from Mongo shell when I try to index 100 results.

So the big question is:

How (the hell) can I resolve that "index key pattern too large" error?


EDIT: 18/08/2014 The document structure clarified

{      "_id" : "519817e508a16b447c00020e", #From Mongodb     "keyword" : "Just an example query",      "date" : "2014-03-28"     "rankings" :     {             "1" : { "domain" : "example1.com", "href" : "http://www.example1.com/", "plus" : "stuff1"},             ...             "100" : {"domain" : "example100.com", "href" : "http://www.example100.com/"plus" : "stuff100"}"}     },      "plus":"many",      "other":"not",      "interesting" : "stuff",      "for": "this question" } 

回答1:

The problem with your suggested structure:

{  keyword" : "Just an example query",   "rankings" :     [{"rank" : 1, "domain" : "example1.com", "href" : "example1.com"},      ...{ "rank" : 99, "domain" : "example99.com", "href" : "example99.com“}  ]} } 

Is that although you can now do

db.ranking.ensureIndex({"rankings.href":"text", "rankings.domain":"text"})  

and then run queries like:

db.ranking.find({$text:{$search:"example1"}}); 

this will now return the whole array document where the array element is matched.

You might want to consider referencing so that each rankings result is a separate document and the keywords and other meta data are referenced, to avoid repeating lots of information.

So, you have a keyword/metadata document like:

{_id:1, "keyword":"example query", "querydate": date, "other stuff":"other meta data"}, {_id:2, "keyword":"example query 2", "querydate": date, "other stuff":"other meta data 2"} 

and then a results document like:

{keyword_id:1, {"rank" : 1, "domain" : "example1.com", "href" : "example1.com"}, ... keyword_id:1, {"rank" : 99, "domain" : "example99.com", "href" : "example99.com"},  keyword_id:2, {"rank" : 1, "domain" : "example1.com", "href" : "example1.com"},  ...keyword_id:2, {"rank" : 99, "domain" : "example99.com", "href" : "example99.com"}} 

where keyword_id links back to (references) the keyword/metadata table -- obviously, in practice, the _ids will look like "_id" : "519817e508a16b447c00020e", but this is just for readability. You could now index on keyword_id, domain and href, either together or separately, depending on your query types and you will not get the index key pattern too large error and you will only get a single matching document rather than a whole array returned.

I am not entirely clear on where you are needing fuzzy/regex style searches and whether you will be searching metadata or just href and domain, but I think this structure should be a cleaner way to start thinking about indexing, without maxing out on indexes, as before. It will also allow you to combine finds on normal indexes with text indexes, depending on your query pattern.

You might find this answer MongoDB relationships: embed or reference? useful when considering you document struture.



回答2:

So, that's my solution: I decided to stick with the embedded document with an overly simple modification: Replacing dictionary keys containing the actual rank by an array containing the rank and that's it:

{    "_id" : "519817e508a16b447c00020e", #From Mongodb   "keyword" : "Just an example query",    "date" : "2014-03-28"   "rankings" :   [     {        "domain" : "example1.com", "href" : "http://www.example1.com/", "plus" : "stuff1", "rank" : 1     },     ...     {       "domain" : "example100.com", "href" : "http://www.example100.com/"plus" : "stuff100", "rank" : 100     }   ]   "plus":"many",    "more":"uninteresting",    "stuff" : "for",    "this": "question" } 

Then, I can select an entire document using for example:

> db.ranking.find({"keyword":"how are you doing", "rank_date" : "2014-08-27”) 

Or a single result by using projections which is just awesome and a new feature in Mongodb 2.6 :-D

> db.collection.find({ "rank_date" : "2014-04-09", "rankings.href": "http://www.example100.com/" }, { "rankings.$": 1 })    [     {        "domain" : "example100.com", "href" : "http://www.example100.com/", "plus" : "stuff100", "rank" : 100     },   ] 

And even get one single url rank directly:

> db.collection.find({"rank_date" : "2014-04-09", "rankings.href": "http://www.example5.com/"}, { "rankings.$": 1 })[0]['rankings'][0]['rank'] 5 

And finally, I'm also creating an index based on the url:

> db.collection.ensureIndex( {"rankings.href" : "text"} ) 

With the index, I can either search for a single url, a partial url, a subdomain or the entire domain so that's just great:

> db.collection.find({ $text: { $search: "example5.com"}}) 



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!