pandas dataframe as latex or html table nbconvert

后端未结

关注

 3  1097

Is it possible to get a nicely formatted table from a pandas dataframe in ipython notebook when using nbconvert to latex & PDF?

The default seems to be just a le

相关标签:

3条回答

Happy的楠姐

2020-12-28 11:17

I wrote my own mako-based template scheme for this. I think it's actually quite an easy workflow if you commit to chugging through it for yourself once. After that, you begin to see that templating the metadata of your desired format so it can be factored out of the code (and doesn't represent a third-party dependence) is a very nice way to solve it.

Here is the workflow I came up with.

Write the .mako template that accepts your dataframe as an argument (and possibly other args) and converts it to the TeX format you want (example below).
Make a wrapper class (I call it to_tex) that makes the API you desire (e.g. so you can pass it your data objects and it handles the call to mako render commands internally).
Within the wraper class, decide on how you want the output. Print the TeX code to the screen? Use a subprocess to actually compile it to a pdf?

In my case, I was working on generating preliminary results for a research paper and needed to format tables into a complicated double-sorted structure with nested column names, etc. Here's an example of what one of the tables looks like:

Example output from templated TeX tool

Here is the mako template for this (warning, gross):

<%page args="df, table_title, group_var, sort_var"/>
<%
"""
Template for country/industry two-panel double sorts TeX table.
Inputs: 
-------
df: pandas DataFrame
    Must be 17 x 12 and have rows and columns that positionally
    correspond to the entries of the table.

table_title: string
    String used for the title of the table.

group_var: string
    String naming the grouping variable for the horizontal sorts.
    Should be 'Country' or 'Industry'.

sort_var: string (raw)
    String naming the variable that is being sorted, e.g.
    "beta" or "ivol". Note that if you want the symbol to
    be rendered as a TeX symbol, then pass a raw Python
    string as the arg and include the needed TeX markup in
    the passed string. If the string isn't raw, some of the
    TeX markup might be interpreted as special characters.

Returns:
--------
When used with mako.template.Template.render, will produce
a raw TeX string that can be rendered into a PDF containing
the specified data.

Author:
-------
Ely M. Spears, 05/21/2013

"""
# Python imports and helper function definitions.
import numpy as np  
def format_helper(x):
    return str(np.round(x,2))
%>


<%text>
\documentclass[10pt]{article}
\usepackage[top=1in, bottom=1in, left=1in, right=1in]{geometry}
\usepackage{array}
\newcolumntype{L}[1]{>{\raggedright\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
\newcolumntype{C}[1]{>{\centering\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
\setlength{\parskip}{1em}
\setlength{\parindent}{0in}
\renewcommand*\arraystretch{1.5}
\author{Ely Spears}


\begin{document}
\begin{table} \caption{</%text>${table_title}<%text>}
\begin{center}
    \begin{tabular}{ | p{2.5cm}  c c c c c p{1cm} c c c c c c p{1cm} |}
    \hline
    & \multicolumn{6}{c}{CAPM $\beta$} & \multicolumn{6}{c}{CAPM $\alpha$ (\%p.a.)} & \\
    \cline{2-7} \cline{9-14}
    & \multicolumn{6}{c}{</%text>${group_var}<%text> </%text>${sort_var}<%text> is:} & \multicolumn{6}{c}{</%text>${group_var}<%text> </%text>${sort_var}<%text> is:} & \\
    Stock </%text>${sort_var}<%text> is: & Low & 2 & 3 & 4 & High & Low - High & & Low & 2 & 3 & 4 & High & Low - High \\ 
    \hline
    \multicolumn{4}{|l}{Panel A. Point estimates} & & & & & & & & & & \\ 
    \hline
    Low            & </%text>${' & '.join(df.ix[0].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[0].map(format_helper).values[6:])}<%text> \\
    2              & </%text>${' & '.join(df.ix[1].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[1].map(format_helper).values[6:])}<%text> \\
    3              & </%text>${' & '.join(df.ix[2].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[2].map(format_helper).values[6:])}<%text> \\
    4              & </%text>${' & '.join(df.ix[3].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[3].map(format_helper).values[6:])}<%text> \\
    High           & </%text>${' & '.join(df.ix[4].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[4].map(format_helper).values[6:])}<%text> \\
    Low - High     & </%text>${' & '.join(df.ix[5].map(format_helper).values[0:5])}<%text> & & & </%text>${' & '.join(df.ix[5].map(format_helper).values[6:11])}<%text> & \\


    \multicolumn{6}{|l}{</%text>${group_var}<%text> effect (average of Low - High \underline{column})}     
        & </%text>${format_helper(df.ix[6,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[6,11])}<%text> \\


    \multicolumn{6}{|l}{Within-</%text>${group_var}<%text> effect (average of Low - High \underline{row})} 
        & </%text>${format_helper(df.ix[7,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[7,11])}<%text> \\


    \multicolumn{13}{|l}{Total effect} & </%text>${format_helper(df.ix[8,11])}<%text>  \\
    \hline
    \multicolumn{4}{|l}{Panel B. t-statistics} & & & & & & & & & & \\
    \hline
    Low            & </%text>${' & '.join(df.ix[9].map(format_helper).values[0:6])}<%text>  & & </%text>${' & '.join(df.ix[9].map(format_helper).values[6:])}<%text> \\
    2              & </%text>${' & '.join(df.ix[10].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[10].map(format_helper).values[6:])}<%text> \\
    3              & </%text>${' & '.join(df.ix[11].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[11].map(format_helper).values[6:])}<%text> \\
    4              & </%text>${' & '.join(df.ix[12].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[12].map(format_helper).values[6:])}<%text> \\
    High           & </%text>${' & '.join(df.ix[13].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[13].map(format_helper).values[6:])}<%text> \\
    Low - High     & </%text>${' & '.join(df.ix[14].map(format_helper).values[0:5])}<%text> & & & </%text>${' & '.join(df.ix[14].map(format_helper).values[6:11])}<%text> & \\


    \multicolumn{6}{|l}{</%text>${group_var}<%text> effect (average of Low - High \underline{column})}     
        & </%text>${format_helper(df.ix[15,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[15,11])}<%text> \\


    \multicolumn{6}{|l}{Within-</%text>${group_var}<%text> effect (average of Low - High \underline{row})} 
        & </%text>${format_helper(df.ix[16,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[16,11])}<%text> \\
    \hline
    \end{tabular}
\end{center}
\end{table}
\end{document}
</%text>

My wrapper to_tex.py looks like this (with example usage in the if __name__ == "__main__" section):

"""
to_tex.py

Class for handling strings of TeX code and producing the
rendered PDF via PDF LaTeX. Assumes ability to call PDFLaTeX
via the operating system.
"""
class to_tex(object):
    """
    Publishes a TeX string to a PDF rendering with pdflatex.
    """
    def __init__(self, tex_string, tex_file, display=False):
        """
        Publish a string to a .tex file, which will be
        rendered into a .pdf file via pdflatex.
        """
        self.tex_string    = tex_string
        self.tex_file      = tex_file
        self.__to_tex_file()
        self.__to_pdf_file(display)
        print "Render status:", self.render_status

    def __to_tex_file(self):
        """
        Writes a tex string to a file.
        """
        with open(self.tex_file, 'w') as t_file:
            t_file.write(self.tex_string)

    def __to_pdf_file(self, display=False):
        """
        Compile a tex file to a pdf file with the
        same file path and name.
        """
        try:
            import os
            from subprocess import Popen
            proc = Popen(["pdflatex", "-output-directory", os.path.dirname(self.tex_file), self.tex_file])
            proc.communicate()
            self.render_status = "success"
        except Exception as e:
            self.render_status = str(e)

        # Launch a display of the pdf if requested.
        if (self.render_status == "success") and display:
            try:
                proc = Popen(["evince", self.tex_file.replace(".tex", ".pdf")])
                proc.communicate()
            except:
                pass

if __name__ == "__main__":
    from mako.template import Template
    template_file = "path/to/template.mako"
    t = Template(filename=template_file)
    tex_str = t.render(arg1="arg1", ...)
    tex_wrapper = to_tex(tex_str, )

My choice was to directly pump the TeX string to pdflatex and leave as an option to display it.

A small snippet of code actually using this with a DataFrame is here:

# Assume calculation work is done prior to this ...
all_beta  = pandas.concat([beta_df,  beta_tstat_df], axis=0)
all_alpha = pandas.concat([alpha_df, alpha_tstat_df], axis=0)
all_df = pandas.concat([all_beta, all_alpha], axis=1)

# Render result in TeX
tex_mako  = "/my_project/templates/mako/two_panel_double_sort_table.mako"
tex_file = "/my_project/some_tex_file_name.tex"

from mako.template import Template
t = Template(filename=tex_mako)
tex_str = t.render(all_df, table_title, group_var, tex_risk_name)

import my_project.to_tex as to_tex
tex_obj = to_tex.to_tex(tex_str, tex_file)

0 讨论(0)

难免孤独

2020-12-28 11:18
There is a simpler approach that is discussed in this Github issue. Basically, you have to add a _repr_latex_ method to the DataFrame class, a procedure that is documented from pandas in their official documentation.

I did this in a notebook like this:
```
import pandas as pd

pd.set_option('display.notebook_repr_html', True)

def _repr_latex_(self):
    return "\centering{%s}" % self.to_latex()

pd.DataFrame._repr_latex_ = _repr_latex_  # monkey patch pandas DataFrame
```
The following code:
```
d = {'one' : [1., 2., 3., 4.],
     'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
df
```
turns into an HTML table if evaluated live in the notebook, and it converts into a (centered) table in PDF format:
```
$ ipython nbconvert --to latex --post PDF notebook.ipynb
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
执念已碎

2020-12-28 11:19
The simplest way available now is to display your dataframe as a markdown table. You may need to install tabulate for this.

In your code cell, when displaying dataframe, use following:
```
from IPython.display import Markdown, display
display(Markdown(df.to_markdown()))
```
Since it is a markdown table, nbconvert can easily translate this into latex.
0 讨论(0)
发布评论:

提交评论
- 加载中...