loop through rows of one csv file to find corresponding data in another

后端未结

关注

 3  1877

I got an interesting problem:

file1.csv has a few hundred rows like:

Code,DTime
1,2010-12-26 17:01
2,2010-12-26 17:07
2,2010-12-26 17:15

相关标签:

3条回答

生来不讨喜

2020-12-06 21:08

Unless you only need to do this once, you should really use a database. Add a column to table2 that contains DATETIME without the seconds, so that you can join on exact matches, not with LIKE.

It WILL be fast, and even faster if you index those columns. And if you can store file1.csv in the database too, you don't need iterations: You can get the entire set of results in a single select query. This is the kind of stuff SQL is made for.

PS. If you decide to pursue this approach, you can ask for help with the query.

0 讨论(0)
发布评论:

提交评论
- 加载中...

野性不改

2020-12-06 21:17

If you don't have duplicate DTime values, this should work:

import csv

file1reader = csv.reader(open("file1.csv"), delimiter=",")
file2reader = csv.reader(open("file2.csv"), delimiter=",")

header1 = file1reader.next() #header
header2 = file2reader.next() #header

for Code, DTime in file1reader:
    for id_, D, Sym, DateTime, Bid, Ask in file2reader:
        if DateTime.startswith(DTime): # found it
            print DateTime, Bid, Ask   # output data
            break                      # break and continue where we left next time

Edit

import csv
from datetime import datetime

file1reader = csv.reader(open("file1.csv"), delimiter=",")
file2reader = csv.reader(open("file2.csv"), delimiter=",")

header1 = file1reader.next() #header
header2 = file2reader.next() #header

for Code, DTime in file1reader:
    DTime = datetime.strptime(DTime, "%Y-%m-%d %H:%M")
    for id_, D, Sym, DateTime, Bid, Ask in file2reader:
        DateTime = datetime.strptime(DateTime, "%Y-%m-%d %H:%M:%S")
        if DateTime>=DTime: # found it
            print DateTime, Bid, Ask   # output data
            break                      # break and continue where we left next time

0 讨论(0)

伪装坚强ぢ

2020-12-06 21:21
you can create a dictionary from file2, where the key is the prefix of the time you want, and the value is either first row, or all the rows matching this prefix. then it's simply a matter of doing something like:
```
entries = file2Dict.get(file1Entry)
if entries:
   print  "First entry is %s" entries[0]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...