Reading HTML contents of a URL in OCaml

試著忘記壹切 提交于 2019-12-09 17:35:10

问题


I would like to write an OCaml function which takes a URL and returns a string made up of the contents of the HTML file at that location. Any ideas?

Thanks a lot!

Best, Surikator.


回答1:


I've done both of those things using ocurl and nethtml

ocurl to read the contents of the URL (tons of properties here; this is the minimum),

let string_of_uri uri = 
    try let connection = Curl.init () and write_buff = Buffer.create 1763 in
        Curl.set_writefunction connection
                (fun x -> Buffer.add_string write_buff x; String.length x);
        Curl.set_url connection uri;
        Curl.perform connection;
        Curl.global_cleanup ();
        Buffer.contents write_buff;
    with _ -> raise (IO_ERROR uri)

and from nethtml; (you might need to set up a DTD for Nethtml.parse)

let parse_html_string uri = 
    let ch = new Netchannels.input_string (string_of_uri uri) in
    let docs = Nethtml.parse ?return_pis:(Some false) ch in
    ch # close_in ();
    docs

Cheers!



来源:https://stackoverflow.com/questions/4621454/reading-html-contents-of-a-url-in-ocaml

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!