Removing everything between a certain set of characters with Swift

邮差的信 提交于 2019-12-17 14:36:13

问题


I'm quite new to Swift and native programming, and for a small project I'm doing for myself I'm getting in the full html after doing a twitter search, and I'm trying to filter out just the text of the first tweet. I'm up to the point were I'm able to get the first tweet, including all the tags that are in there, but I'm a bit clueless on how to filter just the text out of there and remove the HTML elements.

For example, it's pretty easy to take a single tweet and filter out the possible <a href=""> and <span> etc. But when I'd change the tweet or search, it wouldnt work as specific. The thing I'm looking for really is on how to remove everything in a string that starts with < and ends with >. This way I'm able to filter out all the stuff I don't need in my string. I'm using "string.componentsSeparatedByString()" to grab the one tweet I need out of all the HTML, but I can't use this method to filter all the stuff out of my string.

Please bear with me since I'm quite new at this, I'm aware that I'm possibly not even doing this right at all and there's a way easier method to pull a single tweet instead of all this hassle. If so, please let me know as well.


回答1:


You can create a function to do it for you as follow:

func html2String(html:String) -> String {
    return NSAttributedString(data: html.dataUsingEncoding(NSUTF8StringEncoding)!, options:[NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType,NSCharacterEncodingDocumentAttribute:NSUTF8StringEncoding], documentAttributes: nil, error: nil)!.string
}

or as an extension:

extension String {
    var html2String:String {
        return NSAttributedString(data: dataUsingEncoding(NSUTF8StringEncoding)!, options: [NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType,NSCharacterEncodingDocumentAttribute:NSUTF8StringEncoding], documentAttributes: nil, error: nil)!.string
    }
    var html2NSAttributedString:NSAttributedString {
        return NSAttributedString(data: dataUsingEncoding(NSUTF8StringEncoding)!, options: [NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType,NSCharacterEncodingDocumentAttribute:NSUTF8StringEncoding], documentAttributes: nil, error: nil)!
    }
}

you might prefer as a NSData extension

extension NSData{
    var htmlString:String {
        return  NSAttributedString(data: self, options: [NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType,NSCharacterEncodingDocumentAttribute:NSUTF8StringEncoding], documentAttributes: nil, error: nil)!.string
    }
}

or NSData as a function:

func html2String(html:NSData)-> String {
    return  NSAttributedString(data: html, options: [NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType,NSCharacterEncodingDocumentAttribute:NSUTF8StringEncoding], documentAttributes: nil, error: nil)!.string
}

Usage:

"<div>Testing<br></div><a href=\"http://stackoverflow.com/questions/27661722/removing-everything-between-a-certain-set-of-characters-with-swift/27662573#27662573\"><span>&nbsp;Hello World !!!</span>".html2String  //  "Testing\n Hello World !!!"

let result = html2String("<div>Testing<br></div><a href=\"http://stackoverflow.com/questions/27661722/removing-everything-between-a-certain-set-of-characters-with-swift/27662573#27662573\"><span>&nbsp;Hello World !!!</span>")  //  "Testing\n Hello World !!!"

// lets load this html as String

import UIKit

class ViewController: UIViewController {
    let questionLink = "http://stackoverflow.com/questions/27661722/removing-everything-between-a-certain-set-of-characters-with-swift/27662573#27662573"
    override func viewDidLoad() {
        super.viewDidLoad()
        // Do any additional setup after loading the view, typically from a nib.
        if let questionUrl = NSURL(string: questionLink) {
            println("LOADING URL")
            if let myHtmlDataFromUrl = NSData(contentsOfURL: questionUrl){
                println(myHtmlDataFromUrl.htmlString)
            }
        }
    }
    override func didReceiveMemoryWarning() {
        super.didReceiveMemoryWarning()
        // Dispose of any resources that can be recreated.
    }
}



回答2:


Quite a lot of values have changed in Swift over the last few years, so I just wanted to post an updated version of Leo Dabus' answer, updated to current Swift syntax.

extension String {

    func removeHTMLEncoding() throws -> String? {
        guard let data = self.data(using: .utf8) else { return nil }
        let attr = try NSAttributedString(
            data: data,
            options: [
                .documentType: NSAttributedString.DocumentType.html,
                .characterEncoding: NSNumber(value: String.Encoding.utf8.rawValue)
            ],
            documentAttributes: nil
        )
        return attr.string
    }

}

Kinda annoying that you still need to convert the string encoding value to an NSNumber - NSAttributedString is pretty out of date!



来源:https://stackoverflow.com/questions/27661722/removing-everything-between-a-certain-set-of-characters-with-swift

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!