问题
Data File 1:
data_20150801.csv
Time Header Header Header Header
2015-08-01 07:00 14.4 14.4 14.4 68
2015-08-01 07:01 14.4 14.4 14.4 68
Data File 2
data2_20150801.csv
Time Header Header
2015-08-01 00:00 90 12312
2015-08-01 00:01 232 13213
......
2015-08-01 07:00 1000 1500
2015-08-01 07:01 2312 1245
2015-08-01 07:02 1232 1232
2015-08-01 07:03 1231 1232
Id like to merge those 2 .csv Files, to get a File That looks like:
Time Header Header Header Header Header Header
2015-08-01 07:00 14.4 14.4 14.4 68 1000 1500
so basically I need to copy the Rows from data2_ and insert them at the right time points in data_ I tried it manually with Notepad ++ but the Problem is, that sometimes there's no entry for one Minute in data2_ so I'd need to check where the missing TimeStep is and skip that point manually.
I did some things in Python but I'm still a noob so I lack the experience on how to start tackling a problem like this?
I'm using a mac and I found that cat command that combines .csv files in a Folder to one cvs file --> is there a way to do this line by line conserving the timestamps?
回答1:
You could use Python Pandas to do this quite easily, but its probably an overengineering:
import pandas as pd
d_one = from_csv('data.csv',sep=',',engine='python',header=0)
d_two = from_csv('data2.csv',sep=',',engine='python',header=0)
d_three = pd.merge(d_one, d_two, left_on='timestamp',right_on='timestamp')
d_three.to_csv('output.csv',sep=',')
I havent had the chance to test this code but it should do what you want, you may need to modify commas for tabs (depending on the file), etc.
回答2:
Not being a Python expert, I would use two dictionaries, using the date-time stamp as key and a list for the other columns as data.
Load one file into one dictionary, and the other file into the other. Then it's pretty simple to merge the two dictionaries using keys that are the same in both.
As for reading the files, there is a standard cvs module that you can use.
回答3:
Considering the solution that proposed the use of Pandas, I would add "index=False" on the to_csv line, turning it out in
d_three.to_csv('output.csv',sep=',', index=False)
This will remove the index column.
来源:https://stackoverflow.com/questions/32717819/merge-csv-files-with-timestamps