Find all the numbers in one file that are not in another file in python

前端 未结 5 2076
灰色年华
灰色年华 2021-01-04 02:08

There are two files, say FileA and FileB and we need to find all the numbers that are in FileA which is not there in FileB. All the numbers in the FileA are sorted and all t

5条回答
  •  旧时难觅i
    2021-01-04 02:32

    If you want to read the files line by line since you don't have so much memory and you need a linear solution you can do this with iter if your files are line based, otherwise see this:

    First in your terminal you can do this to generate some test files:

    seq 0 3 100 > 3k.txt
    seq 0 2 100 > 2k.txt
    

    Then you run this code:

    i1 = iter(open("3k.txt"))
    i2 = iter(open("2k.txt"))
    a = int(next(i1))
    b = int(next(i2))
    aNotB = []
    # bNotA = []
    while True:
        try:
            if a < b:
                aNotB += [a]
                a = int(next(i1, None))
            elif a > b:
                # bNotA += [a]
                b = int(next(i2, None))
            elif a == b:
                a = int(next(i1, None))
                b = int(next(i2, None))
        except TypeError:
            if not b:
                aNotB += list(i1)
                break
            else:
                # bNotA += list(i1)
                break
    print(aNotB)
    

    Output:

    [3, 9, 15, 21, 27, 33, 39, 45, 51, 57, 63, 69, 75, 81, 87, 93, 99] If you want both the result for aNotB and bNotA you can uncomment those two lines.

    Timing comparison with Andrej Kesely's answer:

    $ seq 0 3 1000000 > 3k.txt
    $ seq 0 2 1000000 > 2k.txt
    $ time python manual_iter.py        
    python manual_iter.py  0.38s user 0.00s system 99% cpu 0.387 total
    $ time python heapq_groupby.py        
    python heapq_groupby.py  1.11s user 0.00s system 99% cpu 1.116 total
    

提交回复
热议问题