问题
Lets say I have a collection
var data = [
{ fieldA: 5 },
{ fieldA: 142, fieldB: 'string' },
{ fieldA: 1324, fieldC: 'string' },
{ fieldB: 'string', fieldD: 111, fieldZ: 'somestring' },
...
];
Lets assume fields are not uniform across elements but I know in advance the number of unique fields, and that the collection is not dynamic.
I want to filter it with something like _.findWhere. This is simple enough, but what if I want to prioritize speed over ease? Is there a better data structure that will always minimize the number of elements that will be checked? Perhaps some kind of tree?
回答1:
Yes, there is something faster if your queries are of the type "give me all records with fieldX=valueY". However, it does have an overhead.
For each field, build an inverted index that lists all the record-ids ( = row positions in the original data) that have each value:
var indexForEachField = {
fieldA: { "5": [0], "142": [1], "1324": [2]},
...
}
When someone asks for "records where fieldX=valueY", you return
indexForEachField["fieldX"]["valueY"]; // an array with all results
Lookup time is therefore constant (and requires only 2 lookups in tables), but you do need to keep your index up to date.
This is a generalization of the strategy used by search engines to look up webpages with certain terms; in that scenario, it is called an inverted index.
Edit: what if you want to find all records with fieldX=valueX and fieldY=valueY?
You would use the following code, which requires all input arrays to be sorted:
var a = indexForEachField["fieldX"]["valueX"];
var b = indexForEachField["fieldY"]["valueY"];
var c = []; // result array: all elements in a AND in b
for (var i=0, j=0; i<a.length && j<b.length; /**/) {
if (a[i] < b[j]) {
i++;
} else if (a[i] > b[j]) {
j++;
} else {
c.push(a[i]);
i++; j++;
}
}
You can see that, in the worst case, the total complexity is exactly a.length + b.length; and, in the best case, half of that. You can use something very similar to implement OR.
来源:https://stackoverflow.com/questions/29135542/fastest-datastructure-for-filtering-schema-less-collections