loop through rows of one csv file to find corresponding data in another

后端 未结 3 1877

I got an interesting problem:

file1.csv has a few hundred rows like:

Code,DTime
1,2010-12-26 17:01
2,2010-12-26 17:07
2,2010-12-26 17:15
相关标签:
3条回答
  • 2020-12-06 21:08

    Unless you only need to do this once, you should really use a database. Add a column to table2 that contains DATETIME without the seconds, so that you can join on exact matches, not with LIKE.

    It WILL be fast, and even faster if you index those columns. And if you can store file1.csv in the database too, you don't need iterations: You can get the entire set of results in a single select query. This is the kind of stuff SQL is made for.

    PS. If you decide to pursue this approach, you can ask for help with the query.

    0 讨论(0)
  • 2020-12-06 21:17

    If you don't have duplicate DTime values, this should work:

    import csv
    
    file1reader = csv.reader(open("file1.csv"), delimiter=",")
    file2reader = csv.reader(open("file2.csv"), delimiter=",")
    
    header1 = file1reader.next() #header
    header2 = file2reader.next() #header
    
    for Code, DTime in file1reader:
        for id_, D, Sym, DateTime, Bid, Ask in file2reader:
            if DateTime.startswith(DTime): # found it
                print DateTime, Bid, Ask   # output data
                break                      # break and continue where we left next time
    

    Edit

    import csv
    from datetime import datetime
    
    file1reader = csv.reader(open("file1.csv"), delimiter=",")
    file2reader = csv.reader(open("file2.csv"), delimiter=",")
    
    header1 = file1reader.next() #header
    header2 = file2reader.next() #header
    
    for Code, DTime in file1reader:
        DTime = datetime.strptime(DTime, "%Y-%m-%d %H:%M")
        for id_, D, Sym, DateTime, Bid, Ask in file2reader:
            DateTime = datetime.strptime(DateTime, "%Y-%m-%d %H:%M:%S")
            if DateTime>=DTime: # found it
                print DateTime, Bid, Ask   # output data
                break                      # break and continue where we left next time
    
    0 讨论(0)
  • 2020-12-06 21:21

    you can create a dictionary from file2, where the key is the prefix of the time you want, and the value is either first row, or all the rows matching this prefix. then it's simply a matter of doing something like:

    entries = file2Dict.get(file1Entry)
    if entries:
       print  "First entry is %s" entries[0]
    
    0 讨论(0)
提交回复
热议问题