From password-protected Excel file to pandas DataFrame

前端 未结 4 886
青春惊慌失措
青春惊慌失措 2020-12-15 11:42

I can open a password-protected Excel file with this:

import sys
import win32com.client
xlApp = win32com.client.Dispatch(\"Excel.Application\")
print \"Excel         


        
相关标签:
4条回答
  • 2020-12-15 11:55

    from David Hamann's site (all credits go to him) https://davidhamann.de/2018/02/21/read-password-protected-excel-files-into-pandas-dataframe/

    Use xlwings, opening the file will first launch the Excel application so you can enter the password.

    import pandas as pd
    import xlwings as xw
    
    PATH = '/Users/me/Desktop/xlwings_sample.xlsx'
    wb = xw.Book(PATH)
    sheet = wb.sheets['sample']
    
    df = sheet['A1:C4'].options(pd.DataFrame, index=False, header=True).value
    df
    
    0 讨论(0)
  • 2020-12-15 12:05

    Based on the suggestion provided by @ikeoddy, this should put the pieces together:

    How to open a password protected excel file using python?

    # Import modules
    import pandas as pd
    import win32com.client
    import os
    import getpass
    
    # Name file variables
    file_path = r'your_file_path'
    file_name = r'your_file_name.extension'
    
    full_name = os.path.join(file_path, file_name)
    # print(full_name)
    

    Getting command-line password input in Python

    # You are prompted to provide the password to open the file
    xl_app = win32com.client.Dispatch('Excel.Application')
    pwd = getpass.getpass('Enter file password: ')
    

    Workbooks.Open Method (Excel)

    xl_wb = xl_app.Workbooks.Open(full_name, False, True, None, pwd)
    xl_app.Visible = False
    xl_sh = xl_wb.Worksheets('your_sheet_name')
    
    # Get last_row
    row_num = 0
    cell_val = ''
    while cell_val != None:
        row_num += 1
        cell_val = xl_sh.Cells(row_num, 1).Value
        # print(row_num, '|', cell_val, type(cell_val))
    last_row = row_num - 1
    # print(last_row)
    
    # Get last_column
    col_num = 0
    cell_val = ''
    while cell_val != None:
        col_num += 1
        cell_val = xl_sh.Cells(1, col_num).Value
        # print(col_num, '|', cell_val, type(cell_val))
    last_col = col_num - 1
    # print(last_col)
    

    ikeoddy's answer:

    content = xl_sh.Range(xl_sh.Cells(1, 1), xl_sh.Cells(last_row, last_col)).Value
    # list(content)
    df = pd.DataFrame(list(content[1:]), columns=content[0])
    df.head()
    

    python win32 COM closing excel workbook

    xl_wb.Close(False)
    
    0 讨论(0)
  • 2020-12-15 12:05

    Assuming that you can save the encrypted file back to disk using the win32com API (which I realize might defeat the purpose) you could then immediately call the top-level pandas function read_excel. You'll need to install some combination of xlrd (for Excel 2003), xlwt (also for 2003), and openpyxl (for Excel 2007) first though. Here is the documentation for reading in Excel files. Currently pandas does not provide support for using the win32com API to read Excel files. You're welcome to open up a GitHub issue if you'd like.

    0 讨论(0)
  • 2020-12-15 12:19

    Assuming the starting cell is given as (StartRow, StartCol) and the ending cell is given as (EndRow, EndCol), I found the following worked for me:

    # Get the content in the rectangular selection region
    # content is a tuple of tuples
    content = xlws.Range(xlws.Cells(StartRow, StartCol), xlws.Cells(EndRow, EndCol)).Value 
    
    # Transfer content to pandas dataframe
    dataframe = pandas.DataFrame(list(content))
    

    Note: Excel Cell B5 is given as row 5, col 2 in win32com. Also, we need list(...) to convert from tuple of tuples to list of tuples, since there is no pandas.DataFrame constructor for a tuple of tuples.

    0 讨论(0)
提交回复
热议问题