Importing CSV with line breaks in Excel 2007

前端 未结 23 2182
悲哀的现实
悲哀的现实 2020-11-29 20:49

I\'m working on a feature to export search results to a CSV file to be opened in Excel. One of the fields is a free-text field, which may contain line breaks, commas, quota

23条回答
  •  野趣味
    野趣味 (楼主)
    2020-11-29 21:26

    Overview

    Almost 10 years after the original post, Excel hasn't improved in importing CSV files. However, I found that it is much better in importing HTML tables. So, one can use Python to convert CSV to HTML and then import the resulting HTML to Excel.

    The advantages of this approach are: (a) it works reliably, (b) you don't need to send your data to a third party service (e.g. Google sheets), (c) no extra "fat" installations required (LibreOffice, Numbers etc.) for most users, (d) higher level than meddling with CR/LF characters and BOM markers, (e) no need to fiddle with locale settings.

    Steps

    The following steps can be run on any bash-like shell as long as Python 3 is installed. Although Python can be used to directly read CSV, csvkit is used to do an intermediate conversion to JSON. This allows us to avoid having to deal with CSV intricacies in our Python code.

    First, save the following script as json2html.py. The script reads a JSON file from stdin and dumps it as an HTML table:

    #!/usr/bin/env python3
    import sys, json, html
    
    if __name__ == '__main__':
        header_emitted = False
        make_th = lambda s: "%s" % (html.escape(s if s else ""))
        make_td = lambda s: "%s" % (html.escape(s if s else ""))
        make_tr = lambda l, make_cell: "%s" % ( "".join([make_cell(v) for v in l]) )
        print("\n")
        for line in json.load(sys.stdin):
            lk, lv = zip(*line.items())
            if not header_emitted:
                print(make_tr(lk, make_th))
                header_emitted = True
            print(make_tr(lv, make_td))
        print("")
    

    Then, install csvkit in a virtual environment and use csvjson to feed the input file to our script. It is a good idea to disable cell type guessing with the -I argument:

    $ virtualenv -p python3 pyenv
    $ . ./pyenv/bin/activate
    $ pip install csvkit
    $ csvjson -I input.csv | python3 json2html.py > output.html
    

    Now output.html can be imported in Excel. Line breaks in cells will have been preserved.

    Optionally, you may want to cleanup your Python virtual environment:

    $ deactivate
    $ rm -rf pyenv
    

    提交回复
    热议问题