Split function add: \xef\xbb\xbf…\n to my list

后端 未结 3 855
予麋鹿
予麋鹿 2020-11-28 07:05

I want to open my file.txt and split all data from this file.

Here is my file.txt:

some_data1 some_data2 some_data3 some_da         


        
3条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-11-28 07:59

    The \xef\xbb\xbf is a Byte Order Mark for UTF-8 - the \x is an escape sequence indicating the next two characters are a hex sequence representing the character code.

    The \n is a new line character. To remove this, you can use rstrip().

    data.rstrip()
    data_list = data.split(' ')
    

    To remove the byte order mark, you can use io.open (assuming you're using 2.6 or 2.7) to open the file in utf-8 mode. Note that can be a bit slower as it's implemented in Python - if speed or older versions of Python are necessary, take a look at codecs.open.

    Try something like this:

    import io
    
    # Make sure we don't lose the list when we close the file
    data_list = []
    
    # Use `with` to ensure the file gets cleaned up properly
    with io.open('file.txt', 'r', encoding='utf-8') as file:
        data = file.read() # Be careful when using read() with big files
        data.rstrip() # Chomp the newline character
        data_list = data.split(' ')
    
    print data_list
    

提交回复
热议问题