Is it safe to mix readline() and line iterators in python file processing?

前端 未结 3 662
梦如初夏
梦如初夏 2020-12-11 14:37

Is it safe to read some lines with readline() and also use for line in file, and is it guaranteed to use the same file position?

Usually, I

相关标签:
3条回答
  • 2020-12-11 15:15

    This works out well in the long run. It ignores the fact that you're processing a file, and works with any sequence. Also, having the explicit iterator object (rdr) hanging around allows you to skip lines inside the body of for loop without messing anything up.

    with open("myfile.txt","r") as source:
        rdr= iter(source)
        heading= next(rdr)
        for line in rdr:
            process( line )
    
    0 讨论(0)
  • 2020-12-11 15:23

    It is safe if the mechanisms are under control.

    =============================

    .

    There is no problem to do an iteration after a readline() instruction

    But there's one to execute a readline() after an iteration

    I created a 'rara.txt' file with this text ( each line have a length of 5 because of the '\r\n' end of line under Windows)

    1AA
    2BB
    3CC
    4DD
    5EE
    6FF
    7GG
    8HH
    9II
    10j
    11k
    12l
    13m
    14n
    15o
    

    And I executed

    FI  = open("rara.txt",'rb')
    lineR = FI.readline()
    print repr(lineR)+'   len=='+str(len(lineR))+\
          '  FI.tell() after FI.readline() : ',FI.tell(),'\n'
    
    cnt = 0
    for line in FI:
        cnt += 1
        print 'cnt=='+str(cnt)+'   '+repr(line)+'   len=='+str(len(line))+\
              "  FI.tell() after 'line in FI' : ",FI.tell()
        if cnt==4:
            break
    print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell(),'\n'
    
    
    lineR = FI.readline()
    print repr(lineR)+'   len=='+str(len(lineR))+\
          '  FI.tell() after FI.readline() : ',FI.tell()
    lineR = FI.readline()
    print repr(lineR)+'   len=='+str(len(lineR))+\
          '  FI.tell() after FI.readline() : ',FI.tell(),'\n'
    
    for line in FI:
        print 'cnt=='+str(cnt)+'   '+repr(line)+'   len=='+str(len(line))+\
              "  FI.tell() after 'line in FI' : ",FI.tell()
    print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell(),'\n'
    

    The result is

    '1AA\r\n'   len==5  FI.tell() after FI.readline() :  5 
    
    cnt==1   '2BB\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==2   '3CC\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==3   '4DD\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==4   '5EE\r\n'   len==5  FI.tell() after 'line in FI' :  75
    
    FI.tell() after iteration 'for line in FI' :  75 
    
    
    Traceback (most recent call last):
      File "E:\Python\NNN codes\esssssai.py", line 16, in <module>
        lineR = FI.readline()
    ValueError: Mixing iteration and read methods would lose data
    

    .

    A strange thing is that if we renew the "cursor" by tell() , method readline() can be active again after an iteration (I don't know what is the behind-the-scene mechanism of "cursor" renewal ):

    FI  = open("rara.txt",'rb')
    lineR = FI.readline()
    print repr(lineR)+'   len=='+str(len(lineR))+\
          '  FI.tell() after FI.readline() : ',FI.tell(),'\n'
    
    cnt = 0
    for line in FI:
        cnt += 1
        print 'cnt=='+str(cnt)+'   '+repr(line)+'   len=='+str(len(line))+\
              "  FI.tell() after 'line in FI' : ",FI.tell()
        if cnt==4:
            pos = FI.tell()
            break
    print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell(),'\n'
    
    FI.seek(pos)
    
    lineR = FI.readline()
    print repr(lineR)+'   len=='+str(len(lineR))+\
          '  FI.tell() after FI.readline() : ',FI.tell()
    lineR = FI.readline()
    print repr(lineR)+'   len=='+str(len(lineR))+\
          '  FI.tell() after FI.readline() : ',FI.tell(),'\n'
    
    for line in FI:
        print 'cnt=='+str(cnt)+'   '+repr(line)+'   len=='+str(len(line))+\
              "  FI.tell() after 'line in FI' : ",FI.tell()
    print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell(),'\n'
    

    result

    '1AA\r\n'   len==5  FI.tell() after FI.readline() :  5 
    
    cnt==1   '2BB\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==2   '3CC\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==3   '4DD\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==4   '5EE\r\n'   len==5  FI.tell() after 'line in FI' :  75
    
    FI.tell() after iteration 'for line in FI' :  75 
    
    ''   len==0  FI.tell() after FI.readline() :  75
    ''   len==0  FI.tell() after FI.readline() :  75 
    
    
    FI.tell() after iteration 'for line in FI' :  75 
    

    Anyway, we note that even if the algorithm is to read only 4 lines during iteration (thanks to the count cnt) , the cursor goes already at the end of the file from the very beginning of the iteration: all the file, ahead of the current position when the iteration begins, is once read.

    So pos = FI.tell() before the break doesn't give the position after the 4 lines read, but the position of the end of the file.


    .

    We must do something special if we want to readline() again , after an iteration , from the exact point at which ended the 4 lines reading during an iteration:

    FI  = open("rara.txt",'rb')
    lineR = FI.readline()
    print repr(lineR)+'   len=='+str(len(lineR))+\
          '  FI.tell() after FI.readline() : ',FI.tell(),'\n'
    
    cnt = 0
    pos = FI.tell()
    for line in FI:
        cnt += 1
        pos += len(line)
        print 'cnt=='+str(cnt)+'   '+repr(line)+'   len=='+str(len(line))+\
              "  FI.tell() after 'line in FI' : ",FI.tell()
        if cnt==4:
            break
    print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell()
    print "    pos   after iteration 'for line in FI' : ",pos,'\n'
    
    FI.seek(pos)
    
    lineR = FI.readline()
    print repr(lineR)+'   len=='+str(len(lineR))+\
          '  FI.tell() after FI.readline() : ',FI.tell()
    lineR = FI.readline()
    print repr(lineR)+'   len=='+str(len(lineR))+\
          '  FI.tell() after FI.readline() : ',FI.tell(),'\n'
    
    cnt = 0
    for line in FI:
        cnt += 1
        print 'cnt=='+str(cnt)+'   '+repr(line)+'   len=='+str(len(line))+\
              "  FI.tell() after 'line in FI' : ",FI.tell()
    print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell(),'\n'
    

    result

    '1AA\r\n'   len==5  FI.tell() after FI.readline() :  5 
    
    cnt==1   '2BB\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==2   '3CC\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==3   '4DD\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==4   '5EE\r\n'   len==5  FI.tell() after 'line in FI' :  75
    
    FI.tell() after iteration 'for line in FI' :  75
        pos   after iteration 'for line in FI' :  25 
    
    '6FF\r\n'   len==5  FI.tell() after FI.readline() :  30
    '7GG\r\n'   len==5  FI.tell() after FI.readline() :  35 
    
    cnt==1   '8HH\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==2   '9II\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==3   '10j\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==4   '11k\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==5   '12l\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==6   '13m\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==7   '14n\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==8   '15o\r\n'   len==5  FI.tell() after 'line in FI' :  75
    
    FI.tell() after iteration 'for line in FI' :  75 
    

    .

    All these manipulations are possible only because the file was opened in binary mode, because I am on Windows which uses '\r\n' as end of lines to write a file, even if it is ordered to write (in 'w' mode) something like 'abcdef\n',

    while on the other hand Python transforms (in mode 'r') all the '\r\n' in '\n'.

    That's a mess, and to control all this, files must be opened in 'rb' if we want to do precise manipulations.


    .

    You know what ? I love these games in the positions of a file

    0 讨论(0)
  • 2020-12-11 15:38

    No, it isn't safe:

    As a consequence of using a read-ahead buffer, combining next() with other file methods (like readline()) does not work right.

    You could use next() to skip the first line here. You should also test for StopIteration, which will be raised if the file is empty.

    with open('myfile.txt') as f:
        try:
            header = next(f)
        except StopIteration as e:
            print "File is empty"
        for line in f:
            # do stuff with line
    
    0 讨论(0)
提交回复
热议问题