Find if a value exists in a column in Excel using python

前端 未结 3 1712
萌比男神i
萌比男神i 2020-12-18 13:50

I have an Excel file with one worksheet that has sediment collection data. I am running a long Python script.

In the worksheet is a column titled “CollectionYear.”

3条回答
  •  攒了一身酷
    2020-12-18 14:26

    Here is what I learned from tackling a needle-in-a-haystack problem for a gigantic pile of .xls files. There are some things xlrd and friends can't (or won't) do, such as getting the formula of a cell. For that, you'll need to use the Microsoft Component Object Model (COM)1.

    I recommend you find yourself a copy of Python Programming on Win32 by Mark Hammond. It's still useful 20 years later. Python Programming on Win32 covers the basics of the COM and how to access it using the pywin32 library (also from Mark Hammond).

    In a nutshell, you can think of the COM as an API between a server (say, Excel) and a client (such as a Python script)2.

    import win32com.client
    
    # Connect to Excel server
    xl = win32com.client.Dispatch("Excel.Application")
    

    The COM API is reasonably well documented. Once you get used to the terminology, things become straight-forward albeit tedious. For example, an Excel file is technically a "Workbook". The "Workbooks" COM object has the Open method which provides a handle for Python to interact with the "Workbook". (Did you notice the different 's' endings on those?)

    import win32com.client
    
    # Connect to Excel server
    xl = win32com.client.Dispatch("Excel.Application")
    
    myfile = r'C:\temp\myworkbook.xls'
    wb = xl.Workbooks.Open(Filename=myfile)
    
    

    A "Workbook" contains a "Sheet", accessed here through the "Sheets" COM object:

    import win32com.client
    
    # Connect to Excel server
    xl = win32com.client.Dispatch("Excel.Application")
    
    myfile = r'C:\temp\myworkbook.xls'
    wb = xl.Workbooks.Open(Filename=myfile)
    sht1 = wb.Sheets.Item(1)
    

    Finally, the 'Cells' property of a worksheet "returns a Range object that represents all the cells on the worksheet". The Range object then has a Find method which will search within the range. The LookIn parameter allows for searching cell values, formulas, and comments.

    import win32com.client
    
    # Connect to Excel server
    xl = win32com.client.Dispatch("Excel.Application")
    
    myfile = r'C:\temp\myworkbook.xls'
    wb = xl.Workbooks.Open(Filename=myfile)
    sht1 = wb.Sheets.Item(1)
    match = sht1.Cells.Find('search string')
    

    The result of Find is a Range object which has many useful properties, like Formula, GetAddress, Value, and Text. You'll also find, as with anything Microsoft, that it's good enough for government work.

    Finally, don't forget to close the workbook and to quit Excel!

    import win32com.client
    
    # Connect to Excel server
    xl = win32com.client.Dispatch("Excel.Application")
    
    myfile = r'C:\temp\myworkbook.xls'
    wb = xl.Workbooks.Open(Filename=myfile)
    sht1 = wb.Sheets.Item(1)
    match = sht1.Cells.Find('search string')
    
    print(match.Formula)
    
    wb.Close(SaveChanges=False)
    xl.Quit()
    

    You can extend these ideas with Sheets.Item and Sheets.Count and iterate over all sheets in a workbook (or all workbooks in a directory). You can have lots of fun!

    The headaches you may encounter include VBA macros and embedded objects, as well as the various different alerts each can produce. Performance is also an issue. The following silence notifications and can dramatically improve performance:

    Application

    • xl.DisplayAlerts (False)
    • xl.AutomationSecurity (msoAutomationSecurityForceDisable)
    • xl.Interactive (False)
    • xl.PrintCommunication (False)
    • xl.ScreenUpdating (False)
    • xl.StatusBar (False)

    Workbook

    • wb.DoNotPromptForConvert (True)
    • wb.EnableAutoRecover (False)
    • wb.KeepChangeHistory (False)

    Another potential issue is late/early binding. Basically, does Python have information about the COM object? This affects things like introspection and how COM objects are referenced. The win32com.client package uses late-bound automation by default.

    With late-bound automation, Python doesn't know much about the COM object:

    >> import win32com.client
    >> xl = win32com.client.Dispatch("Excel.Application")
    >> xl
    
    >> len(dir(xl))
    55
    

    With early-bound automation, Python has full knowledge of the object:

    >> import win32com.client
    >> xl = win32com.client.Dispatch("Excel.Application")
    >> xl
    
    >> len(dir(xl))
    125
    

    To enable early binding, you must run makepy.py which is included with pywin32. Running makepy.py will prompt for the library to bind with.

    (venv) c:\temp\venv\Lib\site-packages\win32com\client>python makepy.py
    python makepy.py
    

    The process creates a Python file (in Temp\) which maps the methods and properties of the COM object.

    (venv) c:\temp\venv\Lib\site-packages\win32com\client>python makepy.py
    python makepy.py
    Generating to C:\Users\Lorem\AppData\Local\Temp\gen_py\3.6\00020813-0000-0000-C000-000000000046x0x1x9.py
    Building definitions from type library...
    Generating...
    Importing module
    

    Early binding also provides access to COM constants, such as msoAutomationSecurityForceDisable and xlAscending and is case-sensitive (whereas late-binding is not).

    That should be enough info to implement a Python-to-Excel library (like xlwings), overkill notwithstanding.


    1 Actually, xlwings works by utilizing the COM though pywin32. Here's to one less dependency!

    2 This example uses win32com.client.Dispatch which requires processing happen through a single Excel instance. Use win32com.client.DispatchEx to create separate instances of Excel.

提交回复
热议问题