Interesting “getElementById() takes exactly 1 argument (2 given)”, sometimes it occurs. Can someone explain it?

前端 未结 4 813
#-*- coding:utf-8 -*-
import win32com.client, pythoncom
import time

ie = win32com.client.DispatchEx(\'InternetExplorer.Application.1\')
ie.Visible = 1
ie.Navigate(\         


        
4条回答
  •  青春惊慌失措
    2021-01-01 18:59

    As a method of a COMObject, getElementById is built by win32com dynamically.
    On my computer, if url is http://ieeexplore.ieee.org/xpl/periodicals.jsp, it will be almost equivalent to

    def getElementById(self):
        return self._ApplyTypes_(3000795, 1, (12, 0), (), 'getElementById', None,)
    

    If the url is www.baidu.com, it will be almost equivalent to

    def getElementById(self, v=pythoncom.Missing):
        ret = self._oleobj_.InvokeTypes(1088, LCID, 1, (9, 0), ((8, 1),),v
                )
        if ret is not None:
            ret = Dispatch(ret, 'getElementById', {3050F1FF-98B5-11CF-BB82-00AA00BDCE0B})
        return ret
    

    Obviously, if you pass an argument to the first code, you'll receive a TypeError. But if you try to use it directly, namely, invoke ie.Document.getElementById(), you won't receive a TypeError, but a com_error.

    Why win32com built the wrong code?
    Let us look at ie and ie.Document. They are both COMObjects, more precisely, win32com.client.CDispatch instances. CDispatch is just a wrapper class. The core is attribute _oleobj_, whose type is PyIDispatch.

    >>> ie, ie.Document
    (, >)
    >>> ie.__class__, ie.Document.__class__
    (,
     )
    >>> oleobj = ie.Document._oleobj_
    >>> oleobj
    
    

    To build getElementById, win32com needs to get the type information for getElementById method from _oleobj_. Roughly, win32com uses the following procedure

    typeinfo = oleobj.GetTypeInfo()
    typecomp = typeinfo.GetTypeComp()
    x, funcdesc = typecomp.Bind('getElementById', pythoncom.INVOKE_FUNC)
    ......
    

    funcdesc contains almost all import information, e.g. the number and types of the parameters.
    If url is http://ieeexplore.ieee.org/xpl/periodicals.jsp, funcdesc.args is (), while the correc funcdesc.args should be ((8, 1, None),).

    Long story in short, win32com had retrieved the wrong type information, thus it built the wrong method.
    I am not sure who is to blame, PyWin32 or IE. But base on my observation, I found nothing wrong in PyWin32's code. On the other hand, the following script runs perfectly in Windows Script Host.

    var ie = new ActiveXObject("InternetExplorer.Application");
    ie.Visible = 1;
    ie.Navigate("http://ieeexplore.ieee.org/xpl/periodicals.jsp");
    WScript.sleep(5000);
    ie.Document.getElementById("browse_keyword").value = "Computer";
    

    Duncan has already pointed out IE's compatibility mode can prevent the problem. Unfortunately, it seems it's impossible to enable compatibility mode from a script.
    But I found a trick, which can help us bypass the problem.

    First, you need to visit a good site, which gives us a HTML page, and retrieve a correct Document object from it.

    ie = win32com.client.DispatchEx('InternetExplorer.Application')
    ie.Visible = 1
    ie.Navigate('http://www.haskell.org/arrows')
    time.sleep(5)
    document = ie.Document
    

    Then jump to the page which doesn't work

    ie.Navigate('http://ieeexplore.ieee.org/xpl/periodicals.jsp')
    time.sleep(5)
    

    Now you can access the DOM of the second page via the old Document object.

    document.getElementById('browse_keyword').value = "Computer"
    

    If you use the new Document object, you will get a TypeError again.

    >>> ie.Document.getElementById('browse_keyword')
    Traceback (most recent call last):
      File "", line 1, in 
    TypeError: getElementById() takes exactly 1 argument (2 given)
    

提交回复
热议问题