Adding a pandas.dataframe to another one with it's own name

妖精的绣舞 提交于 2021-01-29 10:15:26

问题


I have data that I want to retrieve from a couple of text files in a folder. For each file in the folder, I create a pandas.DataFrame to store the data. For now it works correctly and all the fils has the same number of rows.

Now what I want to do is to add each of these dataframes to a 'master' dataframe containing all of them. I would like to add each of these dataframes to the master dataframe with their file name.
I already have the file name.

For example, let say I have 2 dataframes with their own file names, I want to add them to the master dataframe with a header for each of these 2 dataframes representing the name of the file.

What I have tried now is the following:

# T0 data
t0_path = "C:/Users/AlexandreOuimet/Box Sync/Analyse Opto/Crunch/GF data crunch/T0/*.txt"
t0_folder = glob.glob(t0_path)
t0_data = pd.DataFrame()

for file in t0_folder:
    raw_data = parseGFfile(file)
    file_data = pd.DataFrame(raw_data, columns=['wavelength', 'max', 'min'])    
    file_name = getFileName(file)

    t0_data.insert(loc=len(t0_data.columns), column=file_name, value=file_data)

Could someone help me with this please?
Thank you :)

Edit: I think I was not clear enough, this is what I am expecting as an output:
output


回答1:


You may be looking for the concat function. Here's an example:

import pandas as pd

A = pd.DataFrame({'Col1': [1, 2, 3], 'Col2': [4, 5, 6]})
B = pd.DataFrame({'Col1': [7, 8, 9], 'Col2': [10, 11, 12]})

a_filename = 'a_filename.txt'
b_filename = 'b_filename.txt'

A['filename'] = a_filename
B['filename'] = b_filename

C = pd.concat((A, B), ignore_index = True)

print(C)

Output:

   Col1  Col2        filename
0     1     4  a_filename.txt
1     2     5  a_filename.txt
2     3     6  a_filename.txt
3     7    10  b_filename.txt
4     8    11  b_filename.txt
5     9    12  b_filename.txt



回答2:


There are a couple changes to make here in order to make this happen in an easy way. I'll list the changes and reasoning below:

  1. Specified which columns your master DataFrame will have
  2. Instead of using some function that it seems like you were trying to define, you can simply create a new column called "file_name" that will be the filepath used to make the DataFrame for every record in that DataFrame. That way, when you combine the DataFrames, each record's origin is clear. I commented that you can make edits to that particular portion if you want to use string methods to clean up the filenames.
  3. At the end, don't use insert. For combining DataFrames with the same columns (a union operation if you're familiar with SQL or with set theory), you can use the append method.
# T0 data
t0_path = "C:/Users/AlexandreOuimet/Box Sync/Analyse Opto/Crunch/GF data crunch/T0/*.txt"
t0_folder = glob.glob(t0_path)
t0_data = pd.DataFrame(columns=['wavelength', 'max', 'min','file_name'])

for file in t0_folder:
    raw_data = parseGFfile(file)
    file_data = pd.DataFrame(raw_data, columns=['wavelength', 'max', 'min'])    
    file_data['file_name'] = file #You can make edits here

    t0_data  = t0_data.append(file_data,ignore_index=True)


来源:https://stackoverflow.com/questions/60994604/adding-a-pandas-dataframe-to-another-one-with-its-own-name

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!