I want to open my file.txt and split all data from this file.
Here is my file.txt:
some_data1 some_data2 some_data3 some_da
The \xef\xbb\xbf is a Byte Order Mark for UTF-8 - the \x is an escape sequence indicating the next two characters are a hex sequence representing the character code.
The \n is a new line character. To remove this, you can use rstrip().
data.rstrip()
data_list = data.split(' ')
To remove the byte order mark, you can use io.open (assuming you're using 2.6 or 2.7) to open the file in utf-8 mode. Note that can be a bit slower as it's implemented in Python - if speed or older versions of Python are necessary, take a look at codecs.open.
Try something like this:
import io
# Make sure we don't lose the list when we close the file
data_list = []
# Use `with` to ensure the file gets cleaned up properly
with io.open('file.txt', 'r', encoding='utf-8') as file:
data = file.read() # Be careful when using read() with big files
data.rstrip() # Chomp the newline character
data_list = data.split(' ')
print data_list