What kind of queries in Core Data can profit from R-Tree index on attributes?

大憨熊 提交于 2020-05-15 12:22:45

问题


After reading this article https://www.sqlite.org/rtree.html about the R*Tree in SQLite, i am currently experimenting with a 2-Dim R-Tree in a Core Data model. Particularly i expected (maybe somewhat naively) some kind of select statement on the index table but i did not see any in the SQLite debug trace when executing a fetch statement on the Region entity with indexed attributes (see predicateBoundaryIdx in the code below).

My questions are: how must a Core Data model (entities, attributes) and the NSPredicate look like in order to benefit from the R-Tree index?

[XCode v11.4, iOS v13.1, Swift. Switched on com.apple.CoreData.SQLDebug 4]

Model

Index

Corresponding database scheme

CREATE TABLE ZPERSON ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER, Z_OPT INTEGER, ZLOCATION INTEGER, Z1CONTACTS INTEGER, ZNAME VARCHAR );
CREATE TABLE ZREGION ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER, Z_OPT INTEGER, ZMAXLATITUDE FLOAT, ZMAXLATITUDEIDX FLOAT, ZMAXLONGITUDE FLOAT, ZMAXLONGITUDEIDX FLOAT, ZMINLATITUDE FLOAT, ZMINLATITUDEIDX FLOAT, ZMINLONGITUDE FLOAT, ZMINLONGITUDEIDX FLOAT, ZNAME VARCHAR );
CREATE INDEX ZPERSON_ZLOCATION_INDEX ON ZPERSON (ZLOCATION);
CREATE INDEX ZPERSON_Z1CONTACTS_INDEX ON ZPERSON (Z1CONTACTS);
CREATE VIRTUAL TABLE Z_Region_RegionIndex USING RTREE (Z_PK INTEGER PRIMARY KEY, ZMINLATITUDEIDX_MIN, ZMINLATITUDEIDX_MAX, ZMAXLATITUDEIDX_MIN, ZMAXLATITUDEIDX_MAX, ZMINLONGITUDEIDX_MIN, ZMINLONGITUDEIDX_MAX, ZMAXLONGITUDEIDX_MIN, ZMAXLONGITUDEIDX_MAX)
/* Z_Region_RegionIndex(Z_PK,ZMINLATITUDEIDX_MIN,ZMINLATITUDEIDX_MAX,ZMAXLATITUDEIDX_MIN,ZMAXLATITUDEIDX_MAX,ZMINLONGITUDEIDX_MIN,ZMINLONGITUDEIDX_MAX,ZMAXLONGITUDEIDX_MIN,ZMAXLONGITUDEIDX_MAX) */;
CREATE TABLE IF NOT EXISTS "Z_Region_RegionIndex_rowid"(rowid INTEGER PRIMARY KEY,nodeno);
CREATE TABLE IF NOT EXISTS "Z_Region_RegionIndex_node"(nodeno INTEGER PRIMARY KEY,data);
CREATE TABLE IF NOT EXISTS "Z_Region_RegionIndex_parent"(nodeno INTEGER PRIMARY KEY,parentnode);

Code for testing

func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?) -> Bool {

    let mainContext: NSManagedObjectContext
    mainContext = persistentContainer.viewContext
    mainContext.mergePolicy = NSMergeByPropertyObjectTrumpMergePolicy
    mainContext.undoManager = nil
    mainContext.shouldDeleteInaccessibleFaults = true
    mainContext.automaticallyMergesChangesFromParent = true


    var personObj: Person
    var locationObj: Region

     let n = 1000000

    let personNr = stride(from: 1, through: n+1, by: 1).map(String.init).shuffled()

    for i in 1...n
    {
        personObj = Person(context: mainContext)
        locationObj = Region(context: mainContext)
        locationObj.name = "Region \(i)"
        locationObj.minlatitude = 40.000000 - Float.random(in: 0 ..< 5)
        locationObj.minlongitude = 9.000000 - Float.random(in: 0 ..< 5)
        locationObj.maxlatitude = 40.000000 + Float.random(in: 0 ..< 5)
        locationObj.maxlongitude = 9.000000 + Float.random(in: 0 ..< 5)
        locationObj.minlatitudeidx = locationObj.minlatitude
        locationObj.minlongitudeidx = locationObj.minlongitude
        locationObj.maxlatitudeidx = locationObj.maxlatitude
        locationObj.maxlongitudeidx = locationObj.maxlongitude
        personObj.name = "Person \(personNr[i])"
        personObj.location = locationObj
        if i % 1000 == 0 {
            saveContext()
        }
    }

    saveContext()


    let request: NSFetchRequest<Region> = Region.fetchRequest()
    let requestIdx: NSFetchRequest<Region> = Region.fetchRequest()

    let eps : Float = 1.0
    let predicateBoundaryIdx = NSPredicate(format: "(minlatitudeidx >= %lf and maxlatitudeidx =< %lf) and (minlongitudeidx >= %lf and maxlongitudeidx =< %lf)",40.000000-eps,40.000000+eps,9.000000-eps,9.000000+eps)
    let predicateBoundary = NSPredicate(format: "(minlatitude >= %lf and maxlatitude =< %lf) and (minlongitude >= %lf and maxlongitude =< %lf)",40.000000-eps,40.000000+eps,9.000000-eps,9.000000+eps)

    requestIdx.predicate = predicateBoundaryIdx;
    request.predicate = predicateBoundary;

    print("fetch index:")
    do {
        let result = try mainContext.count(for:requestIdx)
        print("Count = \(result)")
    } catch {
        print("Error: \(error)")
    }
    print("fetch no index:")
    do {
        let result = try mainContext.count(for:request)
        print("Count = \(result)")
    } catch {
        print("Error: \(error)")
    }

    for store in (persistentContainer.persistentStoreCoordinator.persistentStores) {
        os_log("Store URL: %@", log: Debug.coredata_log, type: .info, store.url?.absoluteString ?? "No Store")
    }

    return true
}

Core Data SQL Trace

CoreData: sql: SELECT COUNT( DISTINCT t0.Z_PK) FROM ZREGION t0 WHERE ( t0.ZMINLATITUDEIDX >= ? AND t0.ZMAXLATITUDEIDX <= ? AND t0.ZMINLONGITUDEIDX >= ? AND t0.ZMAXLONGITUDEIDX <= ?) 

回答1:


CoreData support for R-Tree indexes was introduced in 2017. WWDC 2017 session 210 covers it and provides an example. As you will see, the key is that you need to use a function in the predicate format string to indicate that the index should be used. There's another example in WWDC 2018 session 224.

Take a slightly simpler variation of your example: an entity with location (latitude and longitude) attributes and a name attribute:

Add a Fetch Index named "bylocation", specify its type as "R-Tree" and add Fetch Index Elements for latitude and longitude:

Modify your code slightly, to reflect the different attributes etc. Prepare two separate predicates, one using the index, the other without, and run them both to compare:

    let mainContext: NSManagedObjectContext
    mainContext = persistentContainer.viewContext
    mainContext.mergePolicy = NSMergeByPropertyObjectTrumpMergePolicy
    mainContext.undoManager = nil
    mainContext.shouldDeleteInaccessibleFaults = true
    mainContext.automaticallyMergesChangesFromParent = true

    var locationObj: Region

    let n = 10 // Just for demo purposes

    for i in 1...n
    {
        locationObj = Region(context: mainContext)
        locationObj.name = "Region \(i)"
        locationObj.latitude = 40.000000 + 5.0 - Float.random(in: 0 ..< 10)
        locationObj.longitude = 9.000000 + 5.0 - Float.random(in: 0 ..< 10)
        if i % 1000 == 0 {
            saveContext()
        }
    }

    saveContext()
    mainContext.reset()

    let request: NSFetchRequest<Region> = Region.fetchRequest()
    let requestIdx: NSFetchRequest<Region> = Region.fetchRequest()

    let eps : Float = 1.0
    let predicateBoundaryIdx = NSPredicate(format: "indexed:by:(latitude, 'bylocation') between { %lf, %lf } AND indexed:by:(longitude, 'bylocation') between { %lf, %lf }", 40.0-eps, 40.0+eps, 9.0-eps, 9.0+eps)
    let predicateBoundary = NSPredicate(format: "latitude between { %lf, %lf } AND longitude between { %lf, %lf} ",40.000000-eps,40.000000+eps,9.000000-eps,9.000000+eps)

    requestIdx.predicate = predicateBoundaryIdx;
    request.predicate = predicateBoundary;

    print("fetch index:")
    do {
        let result = try mainContext.fetch(requestIdx)
        print("Count = \(result.count)")
    } catch {
        print("Error: \(error)")
    }
    mainContext.reset()
    print("fetch no index:")
    do {
        let result = try mainContext.fetch(request)
        print("Count = \(result.count)")
    } catch {
        print("Error: \(error)")
    }

Run that with SQLDebug = 4, and you can then see a bit of what's going on in the logs. First, the database is created and the Region table is added, followed by the RTree index. Triggers are created to add the relevant data to the index whenever the Region table is amended:

CoreData: sql: CREATE TABLE ZREGION ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER, Z_OPT INTEGER, ZLATITUDE FLOAT, ZLONGITUDE FLOAT, ZNAME VARCHAR )
CoreData: sql: CREATE VIRTUAL TABLE IF NOT EXISTS Z_Region_bylocation USING RTREE (Z_PK INTEGER PRIMARY KEY, ZLATITUDE_MIN, ZLATITUDE_MAX, ZLONGITUDE_MIN, ZLONGITUDE_MAX)
CoreData: sql: CREATE TRIGGER IF NOT EXISTS Z_Region_bylocation_INSERT AFTER INSERT ON ZREGION FOR EACH ROW BEGIN INSERT OR REPLACE INTO Z_Region_bylocation (Z_PK, ZLATITUDE_MIN, ZLATITUDE_MAX, ZLONGITUDE_MIN, ZLONGITUDE_MAX) VALUES (NEW.Z_PK, NEW.ZLATITUDE, NEW.ZLATITUDE, NEW.ZLONGITUDE, NEW.ZLONGITUDE) ; END
CoreData: sql: CREATE TRIGGER IF NOT EXISTS Z_Region_bylocation_UPDATE AFTER UPDATE ON ZREGION FOR EACH ROW BEGIN DELETE FROM Z_Region_bylocation WHERE Z_PK = NEW.Z_PK ; INSERT INTO Z_Region_bylocation (Z_PK, ZLATITUDE_MIN, ZLATITUDE_MAX, ZLONGITUDE_MIN, ZLONGITUDE_MAX) VALUES (NEW.Z_PK, NEW.ZLATITUDE, NEW.ZLATITUDE, NEW.ZLONGITUDE, NEW.ZLONGITUDE) ; END
CoreData: sql: CREATE TRIGGER IF NOT EXISTS Z_Region_bylocation_DELETE AFTER DELETE ON ZREGION FOR EACH ROW BEGIN DELETE FROM Z_Region_bylocation WHERE Z_PK = OLD.Z_PK ; END

Then when it comes to the fetches, you can see the two different queries being sent to SQLite:

With the index:

CoreData: sql: SELECT 0, t0.Z_PK, t0.Z_OPT, t0.ZLATITUDE, t0.ZLONGITUDE, t0.ZNAME FROM ZREGION t0 WHERE ( t0.Z_PK IN (SELECT n1_t0.Z_PK FROM Z_Region_bylocation n1_t0 WHERE (? <= n1_t0.ZLATITUDE_MIN AND n1_t0.ZLATITUDE_MAX <= ?)) AND  t0.Z_PK IN (SELECT n1_t0.Z_PK FROM Z_Region_bylocation n1_t0 WHERE (? <= n1_t0.ZLONGITUDE_MIN AND n1_t0.ZLONGITUDE_MAX <= ?)))

and the logs even include the query plan used by SQLite:

 2 0 0 SEARCH TABLE ZREGION AS t0 USING INTEGER PRIMARY KEY (rowid=?)
 6 0 0 LIST SUBQUERY 1
 8 6 0 SCAN TABLE Z_Region_bylocation AS n1_t0 VIRTUAL TABLE INDEX 2:D0B1
 26 0 0 LIST SUBQUERY 2
 28 26 0 SCAN TABLE Z_Region_bylocation AS n1_t0 VIRTUAL TABLE INDEX 2:D2B3

Without the index:

CoreData: sql: SELECT 0, t0.Z_PK, t0.Z_OPT, t0.ZLATITUDE, t0.ZLONGITUDE, t0.ZNAME FROM ZREGION t0 WHERE (( t0.ZLATITUDE BETWEEN ? AND ?) AND ( t0.ZLONGITUDE BETWEEN ? AND ?))

 2 0 0 SCAN TABLE ZREGION AS t0

What you can see from this is that using the index involves some pretty messy subselects. I found the result was that for small datasets, the index actually slows things down. Likewise if the result set is large. But if the dataset is large and the result set is small, there is an advantage. I leave it to you to play and work out whether the game is worth the candle. One thing I can't quite fathom is that using the index requires two separate subselects, one for the longitude and one for the latitude. That seems to me (though maybe I'm missing something) to undermine the whole point of R-Trees, namely their multidimensionality.




回答2:


I've slightly modified the database from the OP for testing the (recently learned) indexed:by: statement and for doing some time measurements:

Database:

Index:

Use Case:

Count people who visited a region.

Here for Region R42 the result should be 2 (Person 1 and 3):

Code:

func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?) -> Bool {

    let mainContext: NSManagedObjectContext
    mainContext = persistentContainer.viewContext
    mainContext.mergePolicy = NSMergeByPropertyObjectTrumpMergePolicy
    mainContext.undoManager = nil
    mainContext.shouldDeleteInaccessibleFaults = true
    mainContext.automaticallyMergesChangesFromParent = true

    var bounds: Bounds
    var location: Bounds
    var person: Person
    var region: Region

    let longstep = 2
    let latstep = 2
    let minlong = 0
    let maxlong = 20
    let minlat = 20
    let maxlat = 55

    let createSomeData: Bool = false

    if(createSomeData) {

        // create some regions


        var hotsptLvl : Dictionary<String,Int> = [:]
        var regionNr: Int = 0

        for long in stride(from: minlong, to: maxlong, by: longstep)
        {
            for lat in stride(from: minlat, to: maxlat, by: latstep) {
                regionNr += 1
                region = Region(context: mainContext)
                bounds = Bounds(context: mainContext)
                bounds.minlongitude = Float(long)
                bounds.maxlongitude = Float(min(long + longstep,maxlong))
                bounds.minlatitude = Float(lat)
                bounds.maxlatitude = Float(min(lat + latstep,maxlat))
                region.bounds = bounds
                region.name = "Region \(regionNr)"
                // hotsptLvl["Region \(regionNr)"] = Int.random(in: 0 ... 100)
                print("region.name = \(String(describing: region.name))")
                if regionNr % 1000 == 0 {
                    saveContext()
                }
            }
        }

        saveContext()

        // create persons and vistited locations

        var k = 0
        let n = 100000

        let personNr = stride(from: 1, through: n+1, by: 1).map(String.init).shuffled()

        for i in 1...n
        {
            person = Person(context: mainContext)
            person.name = "Person \(personNr[i])"
            let isInfected = Float.random(in: 0 ..< 1000)
            person.infected = isInfected < 1 ? true : false
            // create locations

            let m = 10
            for _ in 1...m
            {
                k += 1
                location = Bounds(context: mainContext)
                location.minlatitude = Float.random(in: Float(minlat + 3 * latstep) ... Float(maxlat)) - Float.random(in: 0 ... Float(3 * latstep))
                location.minlongitude = Float.random(in: Float(minlong + 3 * longstep) ... Float(maxlong)) - Float.random(in: 0 ... Float(3 * longstep))
                location.maxlatitude = min(location.minlatitude + Float.random(in: 0 ... Float(3 * latstep)),Float(maxlat))
                location.maxlongitude = min(location.minlongitude + Float.random(in: 0 ... Float(3 * longstep)),Float(maxlong))
                person.addToLocations(location)
                if k % 1000 == 0 {
                    saveContext()
                }
            }
        }

        saveContext()
    }

    let start = Date()
    for regionName in ["Region 1","Region 13","Region 43","Region 101","Region 113","Region 145"] {
        print("\(Calendar.current.dateComponents([Calendar.Component.second], from:start, to:Date()).second!) Region: \(regionName)")
        let requestOnRegion: NSFetchRequest<Region> = Region.fetchRequest()
        let someRegion = NSPredicate(format: "(name = %@)",regionName)
        requestOnRegion.predicate = someRegion

        do {
            let regionResA : [Region] = try mainContext.fetch(requestOnRegion) as [Region]
            let regionRes : Region = regionResA[0]
            print("\(Calendar.current.dateComponents([Calendar.Component.second], from:start, to:Date()).second!) Region: L1 = (\(regionRes.bounds!.minlongitude),\(regionRes.bounds!.minlatitude)) R1 = (\(regionRes.bounds!.maxlongitude),\(regionRes.bounds!.maxlatitude))")

           let someBounds1 = NSPredicate(format: "(region = nil) && (minlongitude <= %lf && maxlongitude >= %lf && minlatitude <= %lf && maxlatitude >= %lf)",
                                         regionRes.bounds!.maxlongitude,
                                         regionRes.bounds!.minlongitude,
                                         regionRes.bounds!.maxlatitude,
                                         regionRes.bounds!.minlatitude)

           let someBounds2 = NSPredicate(format: "(region = nil) && (indexed:by:(minlongitude, 'BoundsIndex') between { %lf, %lf } && " +
                                                   "indexed:by:(maxlongitude, 'BoundsIndex') between { %lf, %lf } && " +
                                                   "indexed:by:(minlatitude, 'BoundsIndex') between { %lf, %lf } && " +
                                                   "indexed:by:(maxlatitude, 'BoundsIndex') between { %lf, %lf} )",
                                         Float(minlong),
                                         regionRes.bounds!.maxlongitude,
                                         regionRes.bounds!.minlongitude,
                                         Float(maxlong),
                                         Float(minlat),
                                         regionRes.bounds!.maxlatitude,
                                         regionRes.bounds!.minlatitude,
                                         Float(maxlat))

            let requestOnBounds: NSFetchRequest<NSDictionary> = NSFetchRequest<NSDictionary>(entityName:"Bounds")
            requestOnBounds.resultType = NSFetchRequestResultType.dictionaryResultType
            requestOnBounds.propertiesToFetch = ["person.name"]
            requestOnBounds.returnsDistinctResults = true
            requestOnBounds.predicate = someBounds1
            print("\n")
            print("\(Calendar.current.dateComponents([Calendar.Component.second], from:start, to:Date()).second!) Start - Fetch (no index):")
            var boundsRes = try mainContext.fetch(requestOnBounds)
            var uniquePersons : [String] = boundsRes.compactMap { $0.value(forKey: "person.name") as? String };
            print("\(Calendar.current.dateComponents([Calendar.Component.second], from:start, to:Date()).second!) Number of Persons in this Region: \(uniquePersons.count)")
            print("\n")
            requestOnBounds.predicate = someBounds2
            print("\(Calendar.current.dateComponents([Calendar.Component.second], from:start, to:Date()).second!) Start - Fetch (with index):")
            boundsRes = try mainContext.fetch(requestOnBounds)
            uniquePersons = boundsRes.compactMap { $0.value(forKey: "person.name") as? String };
            print("\(Calendar.current.dateComponents([Calendar.Component.second], from:start, to:Date()).second!) Number of Persons in this Region: \(uniquePersons.count)")
            print("\n")
       } catch {
           print("Error: \(error)")
       }
    }



    for store in (persistentContainer.persistentStoreCoordinator.persistentStores) {
        os_log("Store URL: %@", log: Debug.coredata_log, type: .info, store.url?.absoluteString ?? "No Store")
    }

    return true
}

Output:

Leading number is time in seconds.

0 Region: Region 1
0 Region: L1 = (0.0,20.0) R1 = (2.0,22.0)


0 Start - Fetch (no index):
2 Number of Persons in this Region: 267


2 Start - Fetch (with index):
10 Number of Persons in this Region: 267


10 Region: Region 13
10 Region: L1 = (0.0,44.0) R1 = (2.0,46.0)


10 Start - Fetch (no index):
11 Number of Persons in this Region: 4049


11 Start - Fetch (with index):
13 Number of Persons in this Region: 4049


13 Region: Region 43
13 Region: L1 = (4.0,32.0) R1 = (6.0,34.0)


13 Start - Fetch (no index):
14 Number of Persons in this Region: 28798


14 Start - Fetch (with index):
17 Number of Persons in this Region: 28798


17 Region: Region 101
17 Region: L1 = (10.0,40.0) R1 = (12.0,42.0)


17 Start - Fetch (no index):
18 Number of Persons in this Region: 46753


18 Start - Fetch (with index):
22 Number of Persons in this Region: 46753


22 Region: Region 113
22 Region: L1 = (12.0,28.0) R1 = (14.0,30.0)


22 Start - Fetch (no index):
22 Number of Persons in this Region: 45312


22 Start - Fetch (with index):
28 Number of Persons in this Region: 45312


28 Region: Region 145
28 Region: L1 = (16.0,20.0) R1 = (18.0,22.0)


28 Start - Fetch (no index):
28 Number of Persons in this Region: 3023


28 Start - Fetch (with index):
34 Number of Persons in this Region: 3023

Result:

  1. indexed:by: causes Core Date to use the R*Tree index.
  2. Using R*Tree was really disadvantageous for query execution time.

Open question:

What type of query and Core Data model does take advantage of a R*Tree index?



来源:https://stackoverflow.com/questions/61627719/what-kind-of-queries-in-core-data-can-profit-from-r-tree-index-on-attributes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!