Split a pandas dataframe using unique; Render as separate tables in Jinja2

问题

I'm working on a project wherein I load a .csv file into a pandas dataframe and make a .PDF report, using Python 3.6 + pandas + jinja2 + weasyprint.

csv -> pandas -> jinja2 -> weasyprint

Here's my challenge: One of pandas dataframes contains info that I want to split by the unique entries in one of its columns, and then display separate tables in jinja2 for each of those splits.

Sample dataframe:

         Clothing  Color   Size
    0    Shirt     Blue    M
    1    Shirt     Blue    L
    2    Shirt     Black   L
    3    Pants     Black   L
    4    Pants     Blue    XL
    5    Jacket    Blue    L
    6    Jacket    Brown   L

I can successfully send this single Pandas dataframe to jinja2 as a whole, by assigning df.to_html() to jinja2 template_vars.

template_vars = {"title" : "My Title",
                 "df": df.to_html()
                 }

and then in the template.html file, calling on that by stating:

<div class="table_standard campaign_kpis_table">
 {{ campaign_kpis }}
</div>

Now, I want to be able to take the Pandas dataframe, and split it out into several dataframes based on the unique items in one column. In this example, 'Color'.

I can do that by using:

df_splits = (df['Color'].unique())

Then, knowing all the unique entries, I can iterate over them, and get the separate dataframe "splits". Please tell me if there's a better word than 'split' to use here, and I'll edit. :)

So I can easily print those separate splits in the terminal by doing something like:

for df_split in df_splits:
    df_split = df.loc[df['Color']==df_split]
    print (df_split)

But how do send those off to jinja2 to be displayed as separate tables?

For my purposes, there could be anywhere from only 1 to as many as 10 unique Colors in the original dataframe. The amount of unique entries will always change, depending on what's in stock. As I understand it, this will create 1 to 10 separate dataframe splits.

What do I write so that jinja2 creates tables for each of these dataframe splits?

Please let me know if you need more code, or if I've failed to adequately explain my situation.

The full code for the " -> jinja2 -> weasyprint" section is based off of https://pbpython.com/pdf-reports.html.

It currently looks like this:

from jinja2 import Environment, FileSystemLoader
env = Environment(loader=FileSystemLoader('.'))

template = env.get_template("templates/template.html")

template_vars = {"df": df.to_html(),
             }

html_out = template.render(template_vars)

with open("/path/to/report.html", "w") as fh:
fh.write(html_out)

from weasyprint import HTML
HTML(string=html_out).write_pdf("/path/to/report.pdf", stylesheets=["templates/style.css"])

回答1:

Instead of unique() you are looking for pandas groupby(). Here is the documentation.
here are the steps:

You can create a groupby object using gb = df.groupby('Color') You can think of the object looking like this:

[('Black',   <black_dataframe>)
('Blue',   <blue_dataframe>)
('Brown',   <brown_dataframe>)]

the enclosing structure is not really a list but just acts like one. it's iterable.

make a template called mytemplate.html in the same folder as the code to run it (for now).

<!DOCTYPE html>
<html>
<head lang="en">
    <meta charset="UTF-8">
    <title>{{title}}</title>
</head>
<body>
   {% for results in data.values() %}
        {{ results | safe}}
    {% endfor %}
</body>
</html>

You would send it off with:

from jinja2 import Environment, FileSystemLoader 
import webbrowser as wb 
from pathlib import Path
import io
import pandas as pd
df = pd.read_csv(io.StringIO('Shirt,Blue,M\nShirt,Blue,L\nShirt,Black,L\nPants,Black,L\nPants,Blue,XL\nJacket,Blue,L\nJacket,Brown,L\n'), names=['Clothing', 'Color', 'Size'])
gb = df.groupby('Color')
env = Environment(loader=FileSystemLoader('.'))
template = env.get_template("mytemplate.html") 
data = {x:y.to_html() for x,y in gb} 
template_vars = {'title': "Tables, Tables everywhere", 'data' : data} 
html_out = template.render(template_vars)  
Path('myoutfile.html').write_text(html_out) 
wb.open(f"""file://{Path('.').absolute()}/myoutfile.html""")

回答2:

Pass the dataframe and df_splits directly to jinja2 and then use a for loop in jinja2

import pandas as pd
from jinja2 import Environment

df = pd.DataFrame([('Shirt','Blue','M'), ('Shirt','Blue','L'), ('Shirt','Black','L'), ('Pants','Black','L'), ('Pants','Blue','XL'), ('Jacket','Blue','L'), ('Jacket','Brown','L')], columns=['Clothing', 'Color', 'Size'])

env = Environment()
tmpl = env.from_string( '''
{% for df_split in df_splits %}
<div>
{{df.loc[df['Color'] == df_split].to_html()}}
</div>
{% endfor %}''')

print(tmpl.render(df=df,df_splits = df['Color'].unique()))

来源：https://stackoverflow.com/questions/58071189/split-a-pandas-dataframe-using-unique-render-as-separate-tables-in-jinja2

标签

python

pandas

jinja2