Retrieving entire data collection from a RavenDB

余生颓废 提交于 2019-12-02 19:14:29

You use paging, and read this 1024 items at a time.

int start = 0;
while(true)
{
   var current = session.Query<User>().Take(1024).Skip(start).ToList();
   if(current.Count == 0)
          break;

   start+= current.Count;
   allUsers.AddRange(current);

}

This question was posted before this feature was available in RavenDB, but in case anyone else stumbles upon this now...

The encouraged way to do this is via the Streaming API. The RavenDB client batches the stream so it can automatically 'page' the requests/responses to/from the server. If you opt in to using the Streaming API, the client assumes you "know what you're doing" and does not check the 128/1024/30 limits that are used for regular queries.

var query = session.Query<User>();
 
using (var enumerator = session.Advanced.Stream(query)) {
    while (enumerator.MoveNext()) {
        allUsers.Add(enumerator.Current.Document);
    }
}

var count = allUsers.Count;

Tip: Though this is the encouraged way to solve the problem... As a general rule it is best to avoid the situation to start with. What if there are a million records? That allUsers list is going to get huge. Maybe an index or transform could be done first to filter out what data you actually need to display to the user/process? Is this for reporting purposes? Maybe RavenDB should be automatically exporting to a SQL server with reporting services on it? Etc...

Building up on the Ayende answer, here is a complete method, that does overcome the problem of 30 queries per session and indeed return all documents of the supplied class:

    public static List<T> getAll<T>(DocumentStore docDB) {
        return getAllFrom(0, new List<T>(), docDB);
    }

    public static List<T> getAllFrom<T>(int startFrom, List<T> list, DocumentStore docDB ) {
        var allUsers = list;

        using (var session = docDB.OpenSession())
        {
            int queryCount = 0;
            int start = startFrom;
            while (true)
            {
                var current = session.Query<T>().Take(1024).Skip(start).ToList();
                queryCount += 1;
                if (current.Count == 0)
                    break;

                start += current.Count;
                allUsers.AddRange(current);

                if (queryCount >= 30)
                {
                    return getAllFrom(start, allUsers, docDB);
                }
            }
        }
        return allUsers;
    }

I hope it is not too hacky to do it like this.

I honestly prefer the following function:

    public IEnumerable<T> GetAll<T>()
    {
        List<T> list = new List<T>();

        RavenQueryStatistics statistics = new RavenQueryStatistics();

        list.AddRange(_session.Query<T>().Statistics(out statistics));
        if (statistics.TotalResults > 128)
        {
            int toTake = statistics.TotalResults - 128;
            int taken = 128;
            while (toTake > 0)
            {
                list.AddRange(_session.Query<T>().Skip(taken).Take(toTake > 1024 ? 1024 : toTake));
                toTake -= 1024;
                taken += 1024;
            }
        }

        return list;
    }

[]'s

Al Dass

With a slight twist on @capaj's post. Here is a generic way of getting all the document IDs as a list of strings. Note the use of Advanced.LuceneQuery<T>(idPropertyName), SelectFields<T>(idPropertyName) and GetProperty(idPropertyName) to make things generic. The default assumes "Id" is a valid property on the given <T> (which should be the case 99.999% of the time). In the event you have some other property as your Id you can pass it in as well.

public static List<string> getAllIds<T>(DocumentStore docDB, string idPropertyName = "Id") {
   return getAllIdsFrom<T>(0, new List<string>(), docDB, idPropertyName);
}

public static List<string> getAllIdsFrom<T>(int startFrom, List<string> list, DocumentStore docDB, string idPropertyName ) {
    var allUsers = list;

    using (var session = docDB.OpenSession())
    {
        int queryCount = 0;
        int start = startFrom;
        while (true)
        {
            var current = session.Advanced.LuceneQuery<T>().Take(1024).Skip(start).SelectFields<T>(idPropertyName).ToList();
            queryCount += 1;
            if (current.Count == 0)
                break;

            start += current.Count;
            allUsers.AddRange(current.Select(t => (t.GetType().GetProperty(idPropertyName).GetValue(t, null)).ToString()));

            if (queryCount >= 28)
            {
                return getAllIdsFrom<T>(start, allUsers, docDB, idPropertyName);
            }
        }
    }
    return allUsers;
}

An example of where/how I use this is when making a PatchRequest in RavenDb using the BulkInsert session. In some cases I may have hundreds of thousands of documents and can't afford to load all the documents in memory just to re-iterate over them again for the patch operation... thus the loading of only their string IDs to pass into the Patch command.

void PatchRavenDocs()
{
    var store = new DocumentStore
    {
        Url = "http://localhost:8080",
        DefaultDatabase = "SoMeDaTaBaSeNaMe"
    };

    store.Initialize();

    // >>>here is where I get all the doc IDs for a given type<<<
    var allIds = getAllIds<SoMeDoCuMeNtTyPe>(store);    

    // create a new patch to ADD a new int property to my documents
    var patches = new[]{ new PatchRequest { Type = PatchCommandType.Set, Name = "SoMeNeWPrOpeRtY" ,Value = 0 }};

    using (var s = store.BulkInsert()){
        int cntr = 0;
        Console.WriteLine("ID Count " + allIds.Count);
        foreach(string id in allIds)
        {
            // apply the patch to my document
            s.DatabaseCommands.Patch(id, patches);

            // spit out a record every 2048 rows as a basic sanity check
            if ((cntr++ % 2048) == 0)
                Console.WriteLine(cntr + " " + id);
        }
    }
}

Hope it helps. :)

I like Al Dass solution of getting ids to operate on instead of complete large objects. Also getting the ids directly from the index. However the recursion scares me a bit (even though I think it might be ok) and I removed the reflection.

public List<string> GetAllIds<T>()
{
var allIds = new List<string>();
IDocumentSession session = null;

try
{
    session = documentStore.OpenSession();
    int queryCount = 0;
    int start = 0;
    while (true)
    {
        var current = session.Advanced.DocumentQuery<T>()
            .Take(1024)
            .Skip(start)
            .SelectFields<string>("__document_id")
            .AddOrder("__document_id")
            .ToList();

        if (current.Count == 0)
            break;
        allIds.AddRange(current);

        queryCount += 1;
        start += current.Count;

        if (queryCount == 30)
        {
            queryCount = 0;
            session.Dispose();
            session = documentStore.OpenSession();
        }
    }
}
finally
{
    if (session != null)
    {
        session.Dispose();
    }
}

return allIds;
}

also, this is updated to ravendb 3

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!