pandas dataframe as latex or html table nbconvert

后端 未结 3 1090
孤街浪徒
孤街浪徒 2020-12-28 10:39

Is it possible to get a nicely formatted table from a pandas dataframe in ipython notebook when using nbconvert to latex & PDF?

The default seems to be just a le

相关标签:
3条回答
  • 2020-12-28 11:17

    I wrote my own mako-based template scheme for this. I think it's actually quite an easy workflow if you commit to chugging through it for yourself once. After that, you begin to see that templating the metadata of your desired format so it can be factored out of the code (and doesn't represent a third-party dependence) is a very nice way to solve it.

    Here is the workflow I came up with.

    1. Write the .mako template that accepts your dataframe as an argument (and possibly other args) and converts it to the TeX format you want (example below).

    2. Make a wrapper class (I call it to_tex) that makes the API you desire (e.g. so you can pass it your data objects and it handles the call to mako render commands internally).

    3. Within the wraper class, decide on how you want the output. Print the TeX code to the screen? Use a subprocess to actually compile it to a pdf?

    In my case, I was working on generating preliminary results for a research paper and needed to format tables into a complicated double-sorted structure with nested column names, etc. Here's an example of what one of the tables looks like:

    Example output from templated TeX tool

    Here is the mako template for this (warning, gross):

    <%page args="df, table_title, group_var, sort_var"/>
    <%
    """
    Template for country/industry two-panel double sorts TeX table.
    Inputs: 
    -------
    df: pandas DataFrame
        Must be 17 x 12 and have rows and columns that positionally
        correspond to the entries of the table.
    
    table_title: string
        String used for the title of the table.
    
    group_var: string
        String naming the grouping variable for the horizontal sorts.
        Should be 'Country' or 'Industry'.
    
    sort_var: string (raw)
        String naming the variable that is being sorted, e.g.
        "beta" or "ivol". Note that if you want the symbol to
        be rendered as a TeX symbol, then pass a raw Python
        string as the arg and include the needed TeX markup in
        the passed string. If the string isn't raw, some of the
        TeX markup might be interpreted as special characters.
    
    Returns:
    --------
    When used with mako.template.Template.render, will produce
    a raw TeX string that can be rendered into a PDF containing
    the specified data.
    
    Author:
    -------
    Ely M. Spears, 05/21/2013
    
    """
    # Python imports and helper function definitions.
    import numpy as np  
    def format_helper(x):
        return str(np.round(x,2))
    %>
    
    
    <%text>
    \documentclass[10pt]{article}
    \usepackage[top=1in, bottom=1in, left=1in, right=1in]{geometry}
    \usepackage{array}
    \newcolumntype{L}[1]{>{\raggedright\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
    \newcolumntype{C}[1]{>{\centering\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
    \setlength{\parskip}{1em}
    \setlength{\parindent}{0in}
    \renewcommand*\arraystretch{1.5}
    \author{Ely Spears}
    
    
    \begin{document}
    \begin{table} \caption{</%text>${table_title}<%text>}
    \begin{center}
        \begin{tabular}{ | p{2.5cm}  c c c c c p{1cm} c c c c c c p{1cm} |}
        \hline
        & \multicolumn{6}{c}{CAPM $\beta$} & \multicolumn{6}{c}{CAPM $\alpha$ (\%p.a.)} & \\
        \cline{2-7} \cline{9-14}
        & \multicolumn{6}{c}{</%text>${group_var}<%text> </%text>${sort_var}<%text> is:} & \multicolumn{6}{c}{</%text>${group_var}<%text> </%text>${sort_var}<%text> is:} & \\
        Stock </%text>${sort_var}<%text> is: & Low & 2 & 3 & 4 & High & Low - High & & Low & 2 & 3 & 4 & High & Low - High \\ 
        \hline
        \multicolumn{4}{|l}{Panel A. Point estimates} & & & & & & & & & & \\ 
        \hline
        Low            & </%text>${' & '.join(df.ix[0].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[0].map(format_helper).values[6:])}<%text> \\
        2              & </%text>${' & '.join(df.ix[1].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[1].map(format_helper).values[6:])}<%text> \\
        3              & </%text>${' & '.join(df.ix[2].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[2].map(format_helper).values[6:])}<%text> \\
        4              & </%text>${' & '.join(df.ix[3].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[3].map(format_helper).values[6:])}<%text> \\
        High           & </%text>${' & '.join(df.ix[4].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[4].map(format_helper).values[6:])}<%text> \\
        Low - High     & </%text>${' & '.join(df.ix[5].map(format_helper).values[0:5])}<%text> & & & </%text>${' & '.join(df.ix[5].map(format_helper).values[6:11])}<%text> & \\
    
    
        \multicolumn{6}{|l}{</%text>${group_var}<%text> effect (average of Low - High \underline{column})}     
            & </%text>${format_helper(df.ix[6,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[6,11])}<%text> \\
    
    
        \multicolumn{6}{|l}{Within-</%text>${group_var}<%text> effect (average of Low - High \underline{row})} 
            & </%text>${format_helper(df.ix[7,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[7,11])}<%text> \\
    
    
        \multicolumn{13}{|l}{Total effect} & </%text>${format_helper(df.ix[8,11])}<%text>  \\
        \hline
        \multicolumn{4}{|l}{Panel B. t-statistics} & & & & & & & & & & \\
        \hline
        Low            & </%text>${' & '.join(df.ix[9].map(format_helper).values[0:6])}<%text>  & & </%text>${' & '.join(df.ix[9].map(format_helper).values[6:])}<%text> \\
        2              & </%text>${' & '.join(df.ix[10].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[10].map(format_helper).values[6:])}<%text> \\
        3              & </%text>${' & '.join(df.ix[11].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[11].map(format_helper).values[6:])}<%text> \\
        4              & </%text>${' & '.join(df.ix[12].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[12].map(format_helper).values[6:])}<%text> \\
        High           & </%text>${' & '.join(df.ix[13].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[13].map(format_helper).values[6:])}<%text> \\
        Low - High     & </%text>${' & '.join(df.ix[14].map(format_helper).values[0:5])}<%text> & & & </%text>${' & '.join(df.ix[14].map(format_helper).values[6:11])}<%text> & \\
    
    
        \multicolumn{6}{|l}{</%text>${group_var}<%text> effect (average of Low - High \underline{column})}     
            & </%text>${format_helper(df.ix[15,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[15,11])}<%text> \\
    
    
        \multicolumn{6}{|l}{Within-</%text>${group_var}<%text> effect (average of Low - High \underline{row})} 
            & </%text>${format_helper(df.ix[16,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[16,11])}<%text> \\
        \hline
        \end{tabular}
    \end{center}
    \end{table}
    \end{document}
    </%text>
    

    My wrapper to_tex.py looks like this (with example usage in the if __name__ == "__main__" section):

    """
    to_tex.py
    
    Class for handling strings of TeX code and producing the
    rendered PDF via PDF LaTeX. Assumes ability to call PDFLaTeX
    via the operating system.
    """
    class to_tex(object):
        """
        Publishes a TeX string to a PDF rendering with pdflatex.
        """
        def __init__(self, tex_string, tex_file, display=False):
            """
            Publish a string to a .tex file, which will be
            rendered into a .pdf file via pdflatex.
            """
            self.tex_string    = tex_string
            self.tex_file      = tex_file
            self.__to_tex_file()
            self.__to_pdf_file(display)
            print "Render status:", self.render_status
    
        def __to_tex_file(self):
            """
            Writes a tex string to a file.
            """
            with open(self.tex_file, 'w') as t_file:
                t_file.write(self.tex_string)
    
        def __to_pdf_file(self, display=False):
            """
            Compile a tex file to a pdf file with the
            same file path and name.
            """
            try:
                import os
                from subprocess import Popen
                proc = Popen(["pdflatex", "-output-directory", os.path.dirname(self.tex_file), self.tex_file])
                proc.communicate()
                self.render_status = "success"
            except Exception as e:
                self.render_status = str(e)
    
            # Launch a display of the pdf if requested.
            if (self.render_status == "success") and display:
                try:
                    proc = Popen(["evince", self.tex_file.replace(".tex", ".pdf")])
                    proc.communicate()
                except:
                    pass
    
    if __name__ == "__main__":
        from mako.template import Template
        template_file = "path/to/template.mako"
        t = Template(filename=template_file)
        tex_str = t.render(arg1="arg1", ...)
        tex_wrapper = to_tex(tex_str, )
    

    My choice was to directly pump the TeX string to pdflatex and leave as an option to display it.

    A small snippet of code actually using this with a DataFrame is here:

    # Assume calculation work is done prior to this ...
    all_beta  = pandas.concat([beta_df,  beta_tstat_df], axis=0)
    all_alpha = pandas.concat([alpha_df, alpha_tstat_df], axis=0)
    all_df = pandas.concat([all_beta, all_alpha], axis=1)
    
    # Render result in TeX
    tex_mako  = "/my_project/templates/mako/two_panel_double_sort_table.mako"
    tex_file = "/my_project/some_tex_file_name.tex"
    
    from mako.template import Template
    t = Template(filename=tex_mako)
    tex_str = t.render(all_df, table_title, group_var, tex_risk_name)
    
    import my_project.to_tex as to_tex
    tex_obj = to_tex.to_tex(tex_str, tex_file)
    
    0 讨论(0)
  • 2020-12-28 11:18

    There is a simpler approach that is discussed in this Github issue. Basically, you have to add a _repr_latex_ method to the DataFrame class, a procedure that is documented from pandas in their official documentation.

    I did this in a notebook like this:

    import pandas as pd
    
    pd.set_option('display.notebook_repr_html', True)
    
    def _repr_latex_(self):
        return "\centering{%s}" % self.to_latex()
    
    pd.DataFrame._repr_latex_ = _repr_latex_  # monkey patch pandas DataFrame
    

    The following code:

    d = {'one' : [1., 2., 3., 4.],
         'two' : [4., 3., 2., 1.]}
    df = pd.DataFrame(d)
    df
    

    turns into an HTML table if evaluated live in the notebook, and it converts into a (centered) table in PDF format:

    $ ipython nbconvert --to latex --post PDF notebook.ipynb
    
    0 讨论(0)
  • 2020-12-28 11:19

    The simplest way available now is to display your dataframe as a markdown table. You may need to install tabulate for this.

    In your code cell, when displaying dataframe, use following:

    from IPython.display import Markdown, display
    display(Markdown(df.to_markdown()))
    

    Since it is a markdown table, nbconvert can easily translate this into latex.

    0 讨论(0)
提交回复
热议问题