Programmatically convert pandas dataframe to markdown table

后端 未结 13 1267
梦毁少年i
梦毁少年i 2020-12-13 03:44

I have a Pandas Dataframe generated from a database, which has data with mixed encodings. For example:

+----+-------------------------+----------+-----------         


        
13条回答
  •  伪装坚强ぢ
    2020-12-13 04:36

    Right, so I've taken a leaf from a question suggested by Rohit (Python - Encoding string - Swedish Letters), extended his answer, and came up with the following:

    # Enforce UTF-8 encoding
    import sys
    stdin, stdout = sys.stdin, sys.stdout
    reload(sys)
    sys.stdin, sys.stdout = stdin, stdout
    sys.setdefaultencoding('UTF-8')
    
    # SQLite3 database
    import sqlite3
    # Pandas: Data structures and data analysis tools
    import pandas as pd
    
    # Read database, attach as Pandas dataframe
    db = sqlite3.connect("Applications.db")
    df = pd.read_sql_query("SELECT path, language, date, shortest_sentence, longest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db)
    db.close()
    df.columns = ['Path', 'Language', 'Date', 'Shortest Sentence', 'Longest Sentence', 'Words', 'Readability Consensus']
    
    # Parse Dataframe and apply Markdown, then save as 'table.md'
    cols = df.columns
    df2 = pd.DataFrame([['---','---','---','---','---','---','---']], columns=cols)
    df3 = pd.concat([df2, df])
    df3.to_csv("table.md", sep="|", index=False)
    

    An important precursor to this is that the shortest_sentence and longest_sentence columns do not contain unnecessary line breaks, as removed by applying .replace('\n', ' ').replace('\r', '') to them before submitting into the SQLite database. It appears that the solution is not to enforce the language-specific encoding (ISO-8859-1 for Norwegian), but rather that UTF-8 is used instead of the default ASCII.

    I ran this through my IPython notebook (Python 2.7.10) and got a table like the following (fixed spacing for appearance here):

    | Path                    | Language | Date       | Shortest Sentence                                                                            | Longest Sentence                                                                                                                                                                                                                                         | Words | Readability Consensus |
    |-------------------------|----------|------------|----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|-----------------------|
    | data/Eng/Something1.txt | Eng      | 2015-09-17 | I am able to relocate to London on short notice.                                             | With my administrative experience in the preparation of the structure and content of seminars in various courses, and critiquing academic papers on various levels, I am confident that I can execute the work required as an editorial assistant.       | 306   | 11th and 12th grade   |
    | data/Nor/NoeNorrønt.txt | Nor      | 2015-09-17 | Jeg har grundig kjennskap til Microsoft Office og Adobe.                                     | I løpet av studiene har jeg vært salgsmedarbeider for et større konsern, hvor jeg solgte forsikring til studentene og de faglige ansatte ved universitetet i Trønderlag, samt renholdsarbeider i et annet, hvor jeg i en periode var avdelingsansvarlig. | 205   | 18th and 19th grade   |
    | data/Nor/Ørret.txt.txt  | Nor      | 2015-09-17 | Jeg håper på positiv tilbakemelding, og møter naturligvis til intervju hvis det er ønskelig. | I løpet av studiene har jeg vært salgsmedarbeider for et større konsern, hvor jeg solgte forsikring til studentene og de faglige ansatte ved universitetet i Trønderlag, samt renholdsarbeider i et annet, hvor jeg i en periode var avdelingsansvarlig. | 160   | 18th and 19th grade   |
    

    Thus, a Markdown table without problems with encoding.

提交回复
热议问题