What is the best way to tokenize/split a NSString in Objective-C?
问题:
回答1:
Found this at http://borkware.com/quickies/one?topic=NSString (useful link):
NSString *string = @"oop:ack:bork:greeble:ponies"; NSArray *chunks = [string componentsSeparatedByString: @":"]; Hope this helps!
Adam
回答2:
Everyone has mentioned componentsSeparatedByString: but you can also use CFStringTokenizer (remember that an NSString and CFString are interchangeable) which will tokenize natural languages too (like Chinese/Japanese which don't split words on spaces).
回答3:
If you just want to split a string, use -[NSString componentsSeparatedByString:]. For more complex tokenization, use the NSScanner class.
回答4:
If your tokenization needs are more complex, check out my open source Cocoa String tokenizing/parsing toolkit: ParseKit:
For simple splitting of strings using a delimiter char (like ':'), ParseKit would definitely be overkill. But again, for complex tokenization needs, ParseKit is extremely powerful/flexible.
Also see the ParseKit Tokenization documentation.
回答5:
If you want to tokenize on multiple characters, you can use NSString's componentsSeparatedByCharactersInSet. NSCharacterSet has some handy pre-made sets like the whitespaceCharacterSet and the illegalCharacterSet. And it has initializers for Unicode ranges.
You can also combine character sets and use them to tokenize, like this:
// Tokenize sSourceEntityName on both whitespace and punctuation. NSMutableCharacterSet *mcharsetWhitePunc = [[NSCharacterSet whitespaceAndNewlineCharacterSet] mutableCopy]; [mcharsetWhitePunc formUnionWithCharacterSet:[NSCharacterSet punctuationCharacterSet]]; NSArray *sarrTokenizedName = [self.sSourceEntityName componentsSeparatedByCharactersInSet:mcharsetWhitePunc]; [mcharsetWhitePunc release]; Be aware that componentsSeparatedByCharactersInSet will produce blank strings if it encounters more than one member of the charSet in a row, so you might want to test for lengths less than 1.
回答6:
If you're looking to tokenise a string into search terms while preserving "quoted phrases", here's an NSString category that respects various types of quote pairs: "" '' ‘’ “”
Usage:
NSArray *terms = [@"This is my \"search phrase\" I want to split" searchTerms]; // results in: ["This", "is", "my", "search phrase", "I", "want", "to", "split"] Code:
@interface NSString (Search) - (NSArray *)searchTerms; @end @implementation NSString (Search) - (NSArray *)searchTerms { // Strip whitespace and setup scanner NSCharacterSet *whitespace = [NSCharacterSet whitespaceAndNewlineCharacterSet]; NSString *searchString = [self stringByTrimmingCharactersInSet:whitespace]; NSScanner *scanner = [NSScanner scannerWithString:searchString]; [scanner setCharactersToBeSkipped:nil]; // we'll handle whitespace ourselves // A few types of quote pairs to check NSDictionary *quotePairs = @{@"\"": @"\"", @"'": @"'", @"\u2018": @"\u2019", @"\u201C": @"\u201D"}; // Scan NSMutableArray *results = [[NSMutableArray alloc] init]; NSString *substring = nil; while (scanner.scanLocation < searchString.length) { // Check for quote at beginning of string unichar unicharacter = [self characterAtIndex:scanner.scanLocation]; NSString *startQuote = [NSString stringWithFormat:@"%C", unicharacter]; NSString *endQuote = [quotePairs objectForKey:startQuote]; if (endQuote != nil) { // if it's a valid start quote we'll have an end quote // Scan quoted phrase into substring (skipping start & end quotes) [scanner scanString:startQuote intoString:nil]; [scanner scanUpToString:endQuote intoString:&substring]; [scanner scanString:endQuote intoString:nil]; } else { // Single word that is non-quoted [scanner scanUpToCharactersFromSet:whitespace intoString:&substring]; } // Process and add the substring to results if (substring) { substring = [substring stringByTri