python -docx to extract table from word docx

前端 未结 1 1408
小蘑菇
小蘑菇 2020-12-15 10:56

I know this is a repeated question but those answers are not works for me. I have a word file which consist one table now i want that table as a output of my python program.

相关标签:
1条回答
  • 2020-12-15 11:08

    Your code works fine for me. How about inserting it into a dataframe?

    import pandas as pd
    from docx.api import Document
    
    document = Document('test_word.docx')
    table = document.tables[0]
    
    data = []
    
    keys = None
    for i, row in enumerate(table.rows):
        text = (cell.text for cell in row.cells)
    
        if i == 0:
            keys = tuple(text)
            continue
        row_data = dict(zip(keys, text))
        data.append(row_data)
        print (data)
    
    df = pd.DataFrame(data)
    

    How can i display particular row and column in that table? We can extract rows and cols based on index with iloc

    # iloc[row,columns] 
    df.iloc[0,:].tolist() # [5,6,7,8]  - row index 0
    df.iloc[:,0].tolist() # [5,9,13,17]  - column index 0
    df.iloc[0,0] # 5  - cell(0,0)
    df.iloc[1:,2].tolist() # [11,15,19]  - column index 2, but skip first row
    

    and so on...

    However, if your columns have names (in this case it is numbers) you can do it like this:

    #df["name"].tolist() 
    df[1].tolist() # [5,6,7,8] - column with name 1 
    

    print(df)
    

    prints, which is how the table looks like in my sample doc.

        1   2   3   4
    0   5   6   7   8
    1   9   10  11  12
    2   13  14  15  16
    3   17  18  19  20
    
    0 讨论(0)
提交回复
热议问题