Converting HTML text into plain text using Objective-C

匿名 (未验证) 提交于 2019-12-03 02:13:02

问题:

I have huge NSString with HTML text inside. The length of this string is more then 3.500.000 characters. How can i convert this HTML text to NSString with plain text inside. I was using scanner , but it works too slowly. Any idea ?

回答1:

It depends what iOS version you are targeting. Since iOS7 there is a built-in method that will not only strip the HTML tags, but also put the formatting to the string:

Xcode 9/Swift 4

if let htmlStringData = htmlString.data(using: .utf8), let attributedString = try? NSAttributedString(data: htmlStringData, options: [.documentType : NSAttributedString.DocumentType.html], documentAttributes: nil) {     print(attributedString) } 

You can even create an extension like this:

extension String {     var htmlToAttributedString: NSAttributedString? {         guard let data = self.data(using: .utf8) else {             return nil         }          do {             return try NSAttributedString(data: data, options: [.documentType : NSAttributedString.DocumentType.html, .characterEncoding: String.Encoding.utf8.rawValue], documentAttributes: nil)         } catch {             print("Cannot convert html string to attributed string: \(error)")             return nil         }     } } 

Note that this sample code is using UTF8 encoding. You can even create a function instead of computed property and add the encoding as a parameter.

Swift 3

let attributedString = try NSAttributedString(data: htmlString.dataUsingEncoding(NSUTF8StringEncoding)!,                                               options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType],                                               documentAttributes: nil) 

Objective-C

[[NSAttributedString alloc] initWithData:[htmlString dataUsingEncoding:NSUTF8StringEncoding] options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]} documentAttributes:nil error:nil]; 

If you just need to remove everything between < and > (dirty way!!!), which might be problematic if you have these characters in the string, use this:

- (NSString *)stringByStrippingHTML {    NSRange r;    NSString *s = [[self copy] autorelease];    while ((r = [s rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)      s = [s stringByReplacingCharactersInRange:r withString:@""];    return s; } 


回答2:

I resolve my question with scanner, but i use it not for all the text. I use it for every 10.000 text part, before i concatenate all parts together. My code below

-(NSString *)convertHTML:(NSString *)html {      NSScanner *myScanner;     NSString *text = nil;     myScanner = [NSScanner scannerWithString:html];      while ([myScanner isAtEnd] == NO) {          [myScanner scanUpToString:@"<" intoString:NULL] ;          [myScanner scanUpToString:@">" intoString:&text] ;          html = [html stringByReplacingOccurrencesOfString:[NSString stringWithFormat:@"%@>", text] withString:@""];     }     //     html = [html stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];      return html; } 


回答3:

For Swift Language ,

NSAttributedString(data:(htmlString as! String).dataUsingEncoding(NSUTF8StringEncoding, allowLossyConversion: true             )!, options:[NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute: NSNumber(unsignedLong: NSUTF8StringEncoding)], documentAttributes: nil, error: nil)! 


回答4:

- (NSString *)stringByStrippingHTML:(NSString *)inputString {     NSMutableString *outString;      if (inputString)     {         outString = [[NSMutableString alloc] initWithString:inputString];          if ([inputString length] > 0)         {             NSRange r;              while ((r = [outString rangeOfString:@"<[^>]+>|&nbsp;" options:NSRegularExpressionSearch]).location != NSNotFound)             {                 [outString deleteCharactersInRange:r];             }               }     }      return outString;  } 


回答5:

Did you try something like that below, Not sure if it will faster as you did before using scanner please check:-

//String which contains html tags     NSString *htmlString=[NSString stringWithFormat:@"%@",@"<b>right</b> onto <b>Kennington Park Rd/A3</b>Continue to follow A3</div><div >Entering toll zone in 1.7&nbsp;km at Newington Causeway/A3</div><divGo through 2 roundabouts</div>"];       NSMutableString *mutStr=[NSMutableString string];     NSString *s = nil; //Removing html elements tags     NSArray *arra=[htmlString componentsSeparatedByCharactersInSet:[NSCharacterSet characterSetWithCharactersInString:@"</>"]];     NSLog(@"%@",arra);     for (s in arra)     {         [mutStr appendString:@" "];         [mutStr appendString:s];     }             NSLog(@"%@",mutStr);//Printing the output 


回答6:

Swift 4:

do {    let cleanString = try NSAttributedString(data: htmlContent.data(using: String.Encoding.utf8)!,                                                                       options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType],                                                                       documentAttributes: nil) } catch {     print("Something went wrong") } 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!