Parse HTML with Swiftsoup (Swift)?

痞子三分冷 提交于 2021-02-10 06:41:53

问题


I'm trying to parse some websites with Swiftsoup, let's say one of the websites is from Medium. How can I extract the body of the website and load the body to another UIViewController like what Instapaper does?

Here is the code I use to extract the title:

import SwiftSoup

class WebViewController: UIViewController, UIWebViewDelegate {

...

override func viewDidLoad() {
        super.viewDidLoad()

        let url = URL(string: "https://medium.com/@timjwise/stop-lying-to-yourself-when-you-snub-panhandlers-its-not-for-their-own-good-199d0aa7a513")
        let request = URLRequest(url: url!)
        webView.loadRequest(request)

        guard let myURL = url else {
        print("Error: \(String(describing: url)) doesn't seem to be a valid URL")
            return
        }
        let html = try! String(contentsOf: myURL, encoding: .utf8)

        do {
            let doc: Document = try SwiftSoup.parseBodyFragment(html)
            let headerTitle = try doc.title()
            print("Header title: \(headerTitle)")
        } catch Exception.Error(let type, let message) {
            print("Message: \(message)")
        } catch {
            print("error")
        }

}

}

But I got no luck to extract the body of the website or any other websites, any way to get it work? CSS or JavaScript (I know nothing about CSS or Javascript)?


回答1:


Use function body https://github.com/scinfu/SwiftSoup#parsing-a-body-fragment Try this:

let html = try! String(contentsOf: myURL, encoding: .utf8)

    do {
        let doc: Document = try SwiftSoup.parseBodyFragment(html)
        let headerTitle = try doc.title()

        // my body
        let body = doc.body()
        // elements to remove, in this case images
        let undesiredElements: Elements? = try body?.select("img[src]")
        //remove
        undesiredElements?.remove()


        print("Header title: \(headerTitle)")
    } catch Exception.Error(let type, let message) {
        print("Message: \(message)")
    } catch {
        print("error")
    }


来源:https://stackoverflow.com/questions/48963919/parse-html-with-swiftsoup-swift

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!