Need to get HTML source as string CEFPython

孤街浪徒 提交于 2019-12-12 07:02:44

问题


I am trying to get HTML source as string from web URL using CEFPython I want MainFrame's source content to be crawled and get string in

def save_screenshot(browser):    
    # Browser object provides GetUserData/SetUserData methods
    # for storing custom data associated with browser. The
    # "OnPaint.buffer_string" data is set in RenderHandler.OnPaint.
    buffer_string = browser.GetUserData("OnPaint.buffer_string")
    if not buffer_string:
        raise Exception("buffer_string is empty, OnPaint never called?")
    mainFrame = browser.GetMainFrame()
    print("Main frame is ", mainFrame)
    # print("buffer string" ,buffer_string)

    # visitor object
    visitorObj = cef_string()
    temp = mainFrame.GetSource(visitorObj).GetString()
    print("temp : ", temp)

    visitorText = mainFrame.GetText(temp)
    siteHTML = mainFrame.GetSource(visitorText)
    print("siteHTML is ", siteHTML)

Problem: The code is returning nothing for siteHTML


回答1:


Your mainframe.GetSource(visitor) is asynchronous. Therefore you cannot call GetString() from it.

This is the way to do, unfortunately you need to think in asynchronous manner:

class Visitor(object)
    def Visit(self, value):
        print("This is the HTML source:")
        print(value)
myvisitor = Visitor()
mainFrame = browser.GetMainFrame()
mainFrame.GetSource(myvisitor)

One more thing to beware of: the visitor object myvisitor in the above example is passed on to GetSource() in weak reference. In other words, you must keep that object alive until the source is passed back. If you put the last three lines in the above snippet in a function, you have to make sure the function does not return until the job is done.



来源:https://stackoverflow.com/questions/44788353/need-to-get-html-source-as-string-cefpython

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!