After looking all over the Internet, I\'ve come to this.
Let\'s say I have already made a text file that reads:
Hello World
Well, I want to remo
Accepted answer of Martijn is simple and kind of works, but does not account for text files with:
vim or gedit)If the text file contains non-English characters, neither of the answers provided so far would work.
What follows is an example, that solves both problems, which also allows removing more than one character from the end of the file:
import os
def truncate_utf8_chars(filename, count, ignore_newlines=True):
"""
Truncates last `count` characters of a text file encoded in UTF-8.
:param filename: The path to the text file to read
:param count: Number of UTF-8 characters to remove from the end of the file
:param ignore_newlines: Set to true, if the newline character at the end of the file should be ignored
"""
with open(filename, 'rb+') as f:
last_char = None
size = os.fstat(f.fileno()).st_size
offset = 1
chars = 0
while offset <= size:
f.seek(-offset, os.SEEK_END)
b = ord(f.read(1))
if ignore_newlines:
if b == 0x0D or b == 0x0A:
offset += 1
continue
if b & 0b10000000 == 0 or b & 0b11000000 == 0b11000000:
# This is the first byte of a UTF8 character
chars += 1
if chars == count:
# When `count` number of characters have been found, move current position back
# with one byte (to include the byte just checked) and truncate the file
f.seek(-1, os.SEEK_CUR)
f.truncate()
return
offset += 1
How it works:
Sample text file - bg.txt:
Здравей свят
How to use:
filename = 'bg.txt'
print('Before truncate:', open(filename).read())
truncate_utf8_chars(filename, 1)
print('After truncate:', open(filename).read())
Outputs:
Before truncate: Здравей свят
After truncate: Здравей свя
This works with both UTF-8 and ASCII encoded files.