Remove very last character in file

前端 未结 7 1834
说谎
说谎 2020-11-29 08:00

After looking all over the Internet, I\'ve come to this.

Let\'s say I have already made a text file that reads: Hello World

Well, I want to remo

7条回答
  •  南笙
    南笙 (楼主)
    2020-11-29 08:35

    Accepted answer of Martijn is simple and kind of works, but does not account for text files with:

    • UTF-8 encoding containing non-English characters (which is the default encoding for text files in Python 3)
    • one newline character at the end of the file (which is the default in Linux editors like vim or gedit)

    If the text file contains non-English characters, neither of the answers provided so far would work.

    What follows is an example, that solves both problems, which also allows removing more than one character from the end of the file:

    import os
    
    
    def truncate_utf8_chars(filename, count, ignore_newlines=True):
        """
        Truncates last `count` characters of a text file encoded in UTF-8.
        :param filename: The path to the text file to read
        :param count: Number of UTF-8 characters to remove from the end of the file
        :param ignore_newlines: Set to true, if the newline character at the end of the file should be ignored
        """
        with open(filename, 'rb+') as f:
            last_char = None
    
            size = os.fstat(f.fileno()).st_size
    
            offset = 1
            chars = 0
            while offset <= size:
                f.seek(-offset, os.SEEK_END)
                b = ord(f.read(1))
    
                if ignore_newlines:
                    if b == 0x0D or b == 0x0A:
                        offset += 1
                        continue
    
                if b & 0b10000000 == 0 or b & 0b11000000 == 0b11000000:
                    # This is the first byte of a UTF8 character
                    chars += 1
                    if chars == count:
                        # When `count` number of characters have been found, move current position back
                        # with one byte (to include the byte just checked) and truncate the file
                        f.seek(-1, os.SEEK_CUR)
                        f.truncate()
                        return
                offset += 1
    

    How it works:

    • Reads only the last few bytes of a UTF-8 encoded text file in binary mode
    • Iterates the bytes backwards, looking for the start of a UTF-8 character
    • Once a character (different from a newline) is found, return that as the last character in the text file

    Sample text file - bg.txt:

    Здравей свят
    

    How to use:

    filename = 'bg.txt'
    print('Before truncate:', open(filename).read())
    truncate_utf8_chars(filename, 1)
    print('After truncate:', open(filename).read())
    

    Outputs:

    Before truncate: Здравей свят
    After truncate: Здравей свя
    

    This works with both UTF-8 and ASCII encoded files.

提交回复
热议问题