Memory-constrained external sorting of strings, with duplicates combined&counted, on a critical server (billions of filenames)

前端 未结 4 1542
梦谈多话
梦谈多话 2020-11-28 13:15

Our server produces files like {c521c143-2a23-42ef-89d1-557915e2323a}-sign.xml in its log folder. The first part is GUID; the second part is name template.

4条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-11-28 13:47

    Your problem is a very good candidate for Map-Reduce. Great news: You don't need to move from C# to Java (Hadoop) as Map-Reduce is possible in .NET framework!

    Through LINQs you have the basic elements of execution in place already for performing Map Reduce in C#. This might be one advantage over going for External Sort though there is no question about the observation behind External Sort. This link has the 'Hello World!' of Map-Reduce already implemented in C# using LINQs and should get you started.


    If you do move to Java, one of the most comprehensive tutorial about it is here. Google about Hadoop and Map-Reduce and you will get plenty of information and numerous good online video tutorials.

    Further, if you wish to move to Java, your requirements of:

    • Sorted results
    • critical RAM usage

    will surely be met as they are inbuilt fulfillments you get from a Map-Reduce job in Hadoop.

提交回复
热议问题