NSString tokenize in Objective-C

匿名 (未验证) 提交于 2019-12-03 02:11:02

问题:

What is the best way to tokenize/split a NSString in Objective-C?

回答1:

Found this at http://borkware.com/quickies/one?topic=NSString (useful link):

NSString *string = @"oop:ack:bork:greeble:ponies"; NSArray *chunks = [string componentsSeparatedByString: @":"];

Hope this helps!

Adam



回答2:

Everyone has mentioned componentsSeparatedByString: but you can also use CFStringTokenizer (remember that an NSString and CFString are interchangeable) which will tokenize natural languages too (like Chinese/Japanese which don't split words on spaces).



回答3:

If you just want to split a string, use -[NSString componentsSeparatedByString:]. For more complex tokenization, use the NSScanner class.



回答4:

If your tokenization needs are more complex, check out my open source Cocoa String tokenizing/parsing toolkit: ParseKit:

http://parsekit.com

For simple splitting of strings using a delimiter char (like ':'), ParseKit would definitely be overkill. But again, for complex tokenization needs, ParseKit is extremely powerful/flexible.

Also see the ParseKit Tokenization documentation.



回答5:

If you want to tokenize on multiple characters, you can use NSString's componentsSeparatedByCharactersInSet. NSCharacterSet has some handy pre-made sets like the whitespaceCharacterSet and the illegalCharacterSet. And it has initializers for Unicode ranges.

You can also combine character sets and use them to tokenize, like this:

// Tokenize sSourceEntityName on both whitespace and punctuation. NSMutableCharacterSet *mcharsetWhitePunc = [[NSCharacterSet whitespaceAndNewlineCharacterSet] mutableCopy]; [mcharsetWhitePunc formUnionWithCharacterSet:[NSCharacterSet punctuationCharacterSet]]; NSArray *sarrTokenizedName = [self.sSourceEntityName componentsSeparatedByCharactersInSet:mcharsetWhitePunc]; [mcharsetWhitePunc release];

Be aware that componentsSeparatedByCharactersInSet will produce blank strings if it encounters more than one member of the charSet in a row, so you might want to test for lengths less than 1.



回答6:

If you're looking to tokenise a string into search terms while preserving "quoted phrases", here's an NSString category that respects various types of quote pairs: "" '' ‘’ “”

Usage:

NSArray *terms = [@"This is my \"search phrase\" I want to split" searchTerms]; // results in: ["This", "is", "my", "search phrase", "I", "want", "to", "split"]

Code:

@interface NSString (Search) - (NSArray *)searchTerms; @end  @implementation NSString (Search)  - (NSArray *)searchTerms {      // Strip whitespace and setup scanner     NSCharacterSet *whitespace = [NSCharacterSet whitespaceAndNewlineCharacterSet];     NSString *searchString = [self stringByTrimmingCharactersInSet:whitespace];     NSScanner *scanner = [NSScanner scannerWithString:searchString];     [scanner setCharactersToBeSkipped:nil]; // we'll handle whitespace ourselves      // A few types of quote pairs to check     NSDictionary *quotePairs = @{@"\"": @"\"",                                  @"'": @"'",                                  @"\u2018": @"\u2019",                                  @"\u201C": @"\u201D"};      // Scan     NSMutableArray *results = [[NSMutableArray alloc] init];     NSString *substring = nil;     while (scanner.scanLocation < searchString.length) {         // Check for quote at beginning of string         unichar unicharacter = [self characterAtIndex:scanner.scanLocation];         NSString *startQuote = [NSString stringWithFormat:@"%C", unicharacter];         NSString *endQuote = [quotePairs objectForKey:startQuote];         if (endQuote != nil) { // if it's a valid start quote we'll have an end quote             // Scan quoted phrase into substring (skipping start & end quotes)             [scanner scanString:startQuote intoString:nil];             [scanner scanUpToString:endQuote intoString:&substring];             [scanner scanString:endQuote intoString:nil];         } else {             // Single word that is non-quoted             [scanner scanUpToCharactersFromSet:whitespace intoString:&substring];         }         // Process and add the substring to results         if (substring) {             substring = [substring stringByTri  
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!