Fast, in-memory range lookup against +5M record table

爷,独闯天下 提交于 2019-12-22 12:17:51

问题


I have a database table with +5M static records in place. Simple structure: (starting int, ending int, result int). So i have an certain INT and i need to find it's corresponding result(int). Currently, the look-up table is in DB, but it needs to reside in-memory, most likely in the environment without a database access.

My solution needs to perform this logic without a database access, in memory and super fast as I need to process 1000s of transactions per second. The size of the set is little over 50MB so I could throw the whole thing into the memory and run range look-ups against it, per this post: Doing a range lookup in C# - how to implement. But I don't know how it will perform on such scale.

  • Do I pre-load that table "on startup" ? It might take a while.
  • Any way to load the table into some .dat file and have super efficient look-up at run time?

BTW, I am on Azure, not sure if using Storage Tables helps on lookups...


回答1:


binary searches are super fast. A binary search on 50M records only takes 27 comparisons to find the answer. Just load it into memory and use the Range lookup you linked.

If you find it is slow, start optimizing:

  • Change the Range object to struct instead of class
  • Hand-code your own binary search algorithm that (a) implements the equality comparer directly instead of calling out to an IEqualityComparer and (b) uses pointers and other unsafe tricks to disable array bounds checking while doing the search.



回答2:


The range lookup code you link to performs a binary search, so performance will be O(log n). I don't think you can do better than that for a range lookup. A HashSet<T>'s lookup is O(1), but you cannot use that structure for a range lookup.

5 million records is not really a huge amount. I suggest you implement a proof of concept with the code you link to on the hardware you will use in production, and measure performance.




回答3:


You could certainly put that in a sequential file and load it at startup. 50 MB will come off the disk in less than a second. And even if you had to parse it as a text file, you should be able to create the table in another second. 5 million records just isn't that large when you're processing them with a 2 GHz (or faster) processor.

Binary search of the list is O(log n), so the maximum number of probes you'll do per search is 24. That's gonna be pretty darned quick.

It should be easy enough to load test something like this. Just spin it up and then see how long it takes to do, say, 1,000,000 lookups. Something like:

var clock = Stopwatch.StartNew();
for (int i = 0; i < NumIterations; ++i)
{
    int val = GetRandomValueToSearchFor(); // however you do that
    Ranges.BinarySearch(val, RangeComparer);
}
clock.Stop();
// time per iteration is clock.TotalMilliseconds/NumIterations

That'll let you figure out the absolute fastest you can query the thing. I suspect you'll be fine with thousands of transactions per second.



来源:https://stackoverflow.com/questions/15262525/fast-in-memory-range-lookup-against-5m-record-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!