Is it safe to mix readline() and line iterators in python file processing?

前端 未结 3 672
梦如初夏
梦如初夏 2020-12-11 14:37

Is it safe to read some lines with readline() and also use for line in file, and is it guaranteed to use the same file position?

Usually, I

3条回答
  •  悲哀的现实
    2020-12-11 15:23

    It is safe if the mechanisms are under control.

    =============================

    .

    There is no problem to do an iteration after a readline() instruction

    But there's one to execute a readline() after an iteration

    I created a 'rara.txt' file with this text ( each line have a length of 5 because of the '\r\n' end of line under Windows)

    1AA
    2BB
    3CC
    4DD
    5EE
    6FF
    7GG
    8HH
    9II
    10j
    11k
    12l
    13m
    14n
    15o
    

    And I executed

    FI  = open("rara.txt",'rb')
    lineR = FI.readline()
    print repr(lineR)+'   len=='+str(len(lineR))+\
          '  FI.tell() after FI.readline() : ',FI.tell(),'\n'
    
    cnt = 0
    for line in FI:
        cnt += 1
        print 'cnt=='+str(cnt)+'   '+repr(line)+'   len=='+str(len(line))+\
              "  FI.tell() after 'line in FI' : ",FI.tell()
        if cnt==4:
            break
    print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell(),'\n'
    
    
    lineR = FI.readline()
    print repr(lineR)+'   len=='+str(len(lineR))+\
          '  FI.tell() after FI.readline() : ',FI.tell()
    lineR = FI.readline()
    print repr(lineR)+'   len=='+str(len(lineR))+\
          '  FI.tell() after FI.readline() : ',FI.tell(),'\n'
    
    for line in FI:
        print 'cnt=='+str(cnt)+'   '+repr(line)+'   len=='+str(len(line))+\
              "  FI.tell() after 'line in FI' : ",FI.tell()
    print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell(),'\n'
    

    The result is

    '1AA\r\n'   len==5  FI.tell() after FI.readline() :  5 
    
    cnt==1   '2BB\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==2   '3CC\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==3   '4DD\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==4   '5EE\r\n'   len==5  FI.tell() after 'line in FI' :  75
    
    FI.tell() after iteration 'for line in FI' :  75 
    
    
    Traceback (most recent call last):
      File "E:\Python\NNN codes\esssssai.py", line 16, in 
        lineR = FI.readline()
    ValueError: Mixing iteration and read methods would lose data
    

    .

    A strange thing is that if we renew the "cursor" by tell() , method readline() can be active again after an iteration (I don't know what is the behind-the-scene mechanism of "cursor" renewal ):

    FI  = open("rara.txt",'rb')
    lineR = FI.readline()
    print repr(lineR)+'   len=='+str(len(lineR))+\
          '  FI.tell() after FI.readline() : ',FI.tell(),'\n'
    
    cnt = 0
    for line in FI:
        cnt += 1
        print 'cnt=='+str(cnt)+'   '+repr(line)+'   len=='+str(len(line))+\
              "  FI.tell() after 'line in FI' : ",FI.tell()
        if cnt==4:
            pos = FI.tell()
            break
    print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell(),'\n'
    
    FI.seek(pos)
    
    lineR = FI.readline()
    print repr(lineR)+'   len=='+str(len(lineR))+\
          '  FI.tell() after FI.readline() : ',FI.tell()
    lineR = FI.readline()
    print repr(lineR)+'   len=='+str(len(lineR))+\
          '  FI.tell() after FI.readline() : ',FI.tell(),'\n'
    
    for line in FI:
        print 'cnt=='+str(cnt)+'   '+repr(line)+'   len=='+str(len(line))+\
              "  FI.tell() after 'line in FI' : ",FI.tell()
    print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell(),'\n'
    

    result

    '1AA\r\n'   len==5  FI.tell() after FI.readline() :  5 
    
    cnt==1   '2BB\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==2   '3CC\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==3   '4DD\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==4   '5EE\r\n'   len==5  FI.tell() after 'line in FI' :  75
    
    FI.tell() after iteration 'for line in FI' :  75 
    
    ''   len==0  FI.tell() after FI.readline() :  75
    ''   len==0  FI.tell() after FI.readline() :  75 
    
    
    FI.tell() after iteration 'for line in FI' :  75 
    

    Anyway, we note that even if the algorithm is to read only 4 lines during iteration (thanks to the count cnt) , the cursor goes already at the end of the file from the very beginning of the iteration: all the file, ahead of the current position when the iteration begins, is once read.

    So pos = FI.tell() before the break doesn't give the position after the 4 lines read, but the position of the end of the file.


    .

    We must do something special if we want to readline() again , after an iteration , from the exact point at which ended the 4 lines reading during an iteration:

    FI  = open("rara.txt",'rb')
    lineR = FI.readline()
    print repr(lineR)+'   len=='+str(len(lineR))+\
          '  FI.tell() after FI.readline() : ',FI.tell(),'\n'
    
    cnt = 0
    pos = FI.tell()
    for line in FI:
        cnt += 1
        pos += len(line)
        print 'cnt=='+str(cnt)+'   '+repr(line)+'   len=='+str(len(line))+\
              "  FI.tell() after 'line in FI' : ",FI.tell()
        if cnt==4:
            break
    print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell()
    print "    pos   after iteration 'for line in FI' : ",pos,'\n'
    
    FI.seek(pos)
    
    lineR = FI.readline()
    print repr(lineR)+'   len=='+str(len(lineR))+\
          '  FI.tell() after FI.readline() : ',FI.tell()
    lineR = FI.readline()
    print repr(lineR)+'   len=='+str(len(lineR))+\
          '  FI.tell() after FI.readline() : ',FI.tell(),'\n'
    
    cnt = 0
    for line in FI:
        cnt += 1
        print 'cnt=='+str(cnt)+'   '+repr(line)+'   len=='+str(len(line))+\
              "  FI.tell() after 'line in FI' : ",FI.tell()
    print "\nFI.tell() after iteration 'for line in FI' : ",FI.tell(),'\n'
    

    result

    '1AA\r\n'   len==5  FI.tell() after FI.readline() :  5 
    
    cnt==1   '2BB\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==2   '3CC\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==3   '4DD\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==4   '5EE\r\n'   len==5  FI.tell() after 'line in FI' :  75
    
    FI.tell() after iteration 'for line in FI' :  75
        pos   after iteration 'for line in FI' :  25 
    
    '6FF\r\n'   len==5  FI.tell() after FI.readline() :  30
    '7GG\r\n'   len==5  FI.tell() after FI.readline() :  35 
    
    cnt==1   '8HH\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==2   '9II\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==3   '10j\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==4   '11k\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==5   '12l\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==6   '13m\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==7   '14n\r\n'   len==5  FI.tell() after 'line in FI' :  75
    cnt==8   '15o\r\n'   len==5  FI.tell() after 'line in FI' :  75
    
    FI.tell() after iteration 'for line in FI' :  75 
    

    .

    All these manipulations are possible only because the file was opened in binary mode, because I am on Windows which uses '\r\n' as end of lines to write a file, even if it is ordered to write (in 'w' mode) something like 'abcdef\n',

    while on the other hand Python transforms (in mode 'r') all the '\r\n' in '\n'.

    That's a mess, and to control all this, files must be opened in 'rb' if we want to do precise manipulations.


    .

    You know what ? I love these games in the positions of a file

提交回复
热议问题