Filtering a large NSArray with NSPredicate

允我心安 提交于 2020-01-03 06:40:10

问题


I have an array containing 170k strings (words in a dictionary), and a string, looking something like "glapplega". I'm trying to extract the word "apple" from the string (with "apple" being a word in the array). I also need to make sure that the extracted word is at least 3 characters. The code I have right now is the following:

NSPredicate *wordPredicate = [NSPredicate predicateWithFormat:@"'%@' contains[cd] SELF", string];
NSPredicate *lengthPredicate = [NSPredicate predicateWithFormat:@"SELF.length > 2"];
NSPredicate *predicate = [NSCompoundPredicate andPredicateWithSubpredicates:@[wordPredicate, lengthPredicate]];
return [_words filteredArrayUsingPredicate:lengthPredicate];

The length predicate works on it's own, but the word predicate does not (it returns an empty array, despite "apple" being a word in the array).

I suspect that there might be a problem with using SELF as the right expression in the predicate, as all the examples I found have it as the left expression, although I have no way of confirming this.

Edit: I'm aware that this can likely be accomplished with regexs (as described here), but was hoping there would be a way around this, as regexs can be slow with such a large dataset.


回答1:


Solving this problem is easy if you iterate the array yourself using a block predicate. At some point a formatted NSPredicate would have to boil down to this, so there shouldn't be much of a performance hit. -[NSString rangeOfString:] can be used to test for inclusion of the string.

return [_words filteredArrayUsingPredicate:[NSPredicate predicateWithBlock:^BOOL (id evaluatedString, NSDictionary *bindings) {
    return string.length > 2 && [string rangeOfString:evaluatedString].location != NSNotFound;
}]];



回答2:


You know what your above assumption and predicate is perfectly valid. The only thing that you have been doing wrong is quotations. Reformat your predicate and make it like this,

  NSArray * array = @[@"Apple", @"lega", @"foo", @"bar"];
  NSString *string = @"glapplega";
  NSPredicate *predicate = [NSPredicate predicateWithFormat:@"%@ contains[cd] SELF and SELF.length > 2", string];
  NSLog(@"%@",[array filteredArrayUsingPredicate:predicate]);

(
    Apple,
    lega
)

When you specify the format and supply the string to the format, the predicate places the quotes by itself. So, you have been mistaking over here.




回答3:


#define rchar (rand() % ('z'-'a') + 'a')

- (void)applicationDidFinishLaunching:(NSNotification *)aNotification
{
    NSMutableArray * mar = [NSMutableArray new];
    for (int i = 0; i<170000; i++)
    {
        NSString * str = [NSString stringWithFormat:@"%c%c%c%c",rchar, rchar, rchar, rchar];
        [mar addObject:str];
    }
    NSString * bigStr = @"asdfghjkl;loiuytrdcvcdrtgvfrtghvcftyghvfghcfdtyjghvncdfjtygmvcnfhjghjkgfhdgsxgrecrvtbkunhlmnhubkujvytchrtxgrecdjvbyhnkbjgcfhvyjhbghnkbjchgdfvbghnukbytvjycterwxrzewxcevfbjnkmjohgytreytwexkutckhtdtcfhvjgkjmhgcjhewwzsserdp9dlkuydssqwsxdchvggjhmgbj";
    NSDate *start = [NSDate date];
    NSArray * marFiltered = [mar filteredArrayUsingPredicate:[NSPredicate predicateWithBlock:^BOOL(id evaluatedObject, NSDictionary *bindings) {
        return [bigStr rangeOfString:evaluatedObject].length>2;
    }]];
    NSLog(@"found %lu items in %f seconds", (unsigned long)[marFiltered count], -[start timeIntervalSinceNow]);
}

output:

2014-05-11 09:09:53.048 170k[89396:303] found 85 items in 0.542431 seconds



回答4:


You can try two options for defining the predicate. A format string and a block. Here is a bit of code that demonstrates both. I've played a bit with both and can share that the performance is the same. I've only had the patience to run it with a max value of INT32_MAX/2 (a lot of items).

Here goes. Hope this clarifies and helps:

    NSString* searchString = @"AB0";
    NSUInteger capacity = 1000000;
    NSMutableArray* array  = [NSMutableArray array];

    NSLog(@"Fillling array with %lu UUIDS. Be patient.", (unsigned long)capacity);
    NSUInteger batch = 0;
    for ( NSUInteger i = 0; i < capacity; i++ ) {
        [array setObject:[[NSUUID UUID] UUIDString] atIndexedSubscript:i];
        if (i != 0 && i % (capacity / 10) == 0 ) {
            NSLog(@"Completed %lu%%", (unsigned long)++batch * 10);
        }
    }

    NSLog(@"Done.");

    NSPredicate* formatPredicate = [NSPredicate predicateWithFormat:@"SELF contains[cd] %@ AND SELF.length > 3", searchString];
    NSLog(@"Filtering with predicate: %@", formatPredicate);
    NSArray* formatArray = [array filteredArrayUsingPredicate:formatPredicate];
    NSLog(@"Got %lu results.", formatArray.count);

    NSPredicate* blockPredicate = [NSPredicate predicateWithBlock:^BOOL(id evaluatedObject, NSDictionary *bindings) {
        NSString* theString = evaluatedObject;
        return theString.length > 3 && [theString rangeOfString:searchString].location != NSNotFound;
    }];

    NSLog(@"Filtering with predicate: %@", blockPredicate);
    NSArray* blockArray = [array filteredArrayUsingPredicate:blockPredicate];
    NSLog(@"Got %lu results.", blockArray.count);

PS: I wouldn't run this on a phone if you are using big numbers line INT32_MAX :)



来源:https://stackoverflow.com/questions/23593869/filtering-a-large-nsarray-with-nspredicate

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!