Remove very last character in file

前端未结

关注

 7  1834

说谎 2020-11-29 08:00

After looking all over the Internet, I\'ve come to this.

Let\'s say I have already made a text file that reads: Hello World

Well, I want to remo

7条回答

南笙 (楼主)

2020-11-29 08:35

Accepted answer of Martijn is simple and kind of works, but does not account for text files with:

UTF-8 encoding containing non-English characters (which is the default encoding for text files in Python 3)
one newline character at the end of the file (which is the default in Linux editors like vim or gedit)

If the text file contains non-English characters, neither of the answers provided so far would work.

What follows is an example, that solves both problems, which also allows removing more than one character from the end of the file:

import os


def truncate_utf8_chars(filename, count, ignore_newlines=True):
    """
    Truncates last `count` characters of a text file encoded in UTF-8.
    :param filename: The path to the text file to read
    :param count: Number of UTF-8 characters to remove from the end of the file
    :param ignore_newlines: Set to true, if the newline character at the end of the file should be ignored
    """
    with open(filename, 'rb+') as f:
        last_char = None

        size = os.fstat(f.fileno()).st_size

        offset = 1
        chars = 0
        while offset <= size:
            f.seek(-offset, os.SEEK_END)
            b = ord(f.read(1))

            if ignore_newlines:
                if b == 0x0D or b == 0x0A:
                    offset += 1
                    continue

            if b & 0b10000000 == 0 or b & 0b11000000 == 0b11000000:
                # This is the first byte of a UTF8 character
                chars += 1
                if chars == count:
                    # When `count` number of characters have been found, move current position back
                    # with one byte (to include the byte just checked) and truncate the file
                    f.seek(-1, os.SEEK_CUR)
                    f.truncate()
                    return
            offset += 1

How it works:

Reads only the last few bytes of a UTF-8 encoded text file in binary mode
Iterates the bytes backwards, looking for the start of a UTF-8 character
Once a character (different from a newline) is found, return that as the last character in the text file

Sample text file - bg.txt:

Здравей свят

How to use:

filename = 'bg.txt'
print('Before truncate:', open(filename).read())
truncate_utf8_chars(filename, 1)
print('After truncate:', open(filename).read())

Outputs:

Before truncate: Здравей свят
After truncate: Здравей свя

This works with both UTF-8 and ASCII encoded files.

0 讨论(0)

查看其它7个回答