问题
I am working on a project in Swift, using a dictionary.
This dictionary is of the type [String : [Posting]]. I have around 200k different "terms" (keys) to insert in it, and for each term I have around 500 to 1000 objects to append in a list. I know it's a weird practice but I don't have the choice and I must deal with all those elements.
The issue is that this is very very slow, as the dictionary gets bigger. I tried switching to a NSMutableDictionary, no luck.
My addTerm function is called everytime I need to insert an element :
func addTerm(_ term: String, withId id: Int, atPosition position: Int) {
if self.map[term] == nil {
self.map[term] = [Posting]()
}
if self.map[term]!.last?.documentId == id {
self.map[term]!.last?.addPosition(position)
}
else {
self.map[term]!.append(Posting(withId: id, atPosition: position, forTerm: term))
}
}
EDIT: I realized that its not the dictionary that causes all this lag, but its actually the arrays it contains. Arrays re-allocate way too much when adding new elements, and the best I could was to replace them with ContiguousArray.
回答1:
This is fairly common performance trap, as also observed in:
- Dictionary in Swift with Mutable Array as value is performing very slow? How to optimize or construct properly?
- Swift semantics regarding dictionary access
The issue stems from the fact that the array you're mutating in the expression self.map[term]!.append(...) is a temporary mutable copy of the underlying array in the dictionary's storage. This means that the array is never uniquely referenced and so always has its buffer re-allocated.
This situation will fixed in Swift 5 with the unofficial introduction of generalised accessors, but until then, one solution (as mentioned in both the above Q&As) is to use Dictionary's subscript(_:default:) which from Swift 4.1 can mutate the value directly in storage.
Although your case isn't quite a straightforward case of applying a single mutation, so you need some kind of wrapper function in order to allow you to have scoped access to your mutable array.
For example, this could look like:
class X {
private var map: [String: [Posting]] = [:]
private func withPostings<R>(
forTerm term: String, mutations: (inout [Posting]) throws -> R
) rethrows -> R {
return try mutations(&map[term, default: []])
}
func addTerm(_ term: String, withId id: Int, atPosition position: Int) {
withPostings(forTerm: term) { postings in
if let posting = postings.last, posting.documentId == id {
posting.addPosition(position)
} else {
postings.append(Posting(withId: id, atPosition: position, forTerm: term))
}
}
}
// ...
}
回答2:
The general approach when your code is too slow is to profile it in Instruments to figure out, which lines actually take the longest and to go from there. There could be bottlenecks somewhere else, etc.. Running your app directly from within Xcode also creates a debug build, which sacrifices performance for debuggability. A release build may perform much better.
Also, if your program takes up large amounts of memory, the system may struggle to make this memory available to your app. On non-iOS platforms, this will lead to swapping out memory to disk, which will impact the performance of your app significantly, as the system cannot anticipate, which elements of the dictionary will be accessed next.
If the memory requirements are not responsible for the slowdown, here are a few approaches that I would try:
If you can estimate the number of items that you want to insert into a dictionary, you can use
dictionary.reserveCapacity(numberOfItems). As the dictionary grows, it may need to be resized, which may require rebuilding the hash table, which the dictionary type uses internally. This approach also works for arrays.Swift provides methods to automatically group items into a dictionary using a common key:
Dictionary(grouping: collection, by: { item in item.property }). This approach may be computationally more efficient, as everything could be processed in one batch.Another approach may be to use other data types, such as a tree map, which would not require frequent reallocations. Swift does however not provide such a type in the standard library.
回答3:
I had the same problem. It was crazy slow for 200K entries... so I made a class and put the array in there...
class MyIndex
{
var entries: [Posting]
}
var map = [String: MyIndex]()
seems to work pretty fast now
来源:https://stackoverflow.com/questions/52637092/swift-enormous-dictionary-of-arrays-very-slow