I can't read in data to R

后端 未结 3 763
猫巷女王i
猫巷女王i 2021-01-07 07:32

I am trying to read in some data that is is a text file that looks like this:

2009-08-09 - 2009-08-15 0   2   0
2009-08-16 - 2009-08-22 0   1   0
2009-08-23          


        
3条回答
  •  慢半拍i
    慢半拍i (楼主)
    2021-01-07 08:18

    What you see here:

    ÿþ
    

    is the Byte Order Mark (BOM) for UTF-16-LE or UCS-2LE. See Wikipedia (Byte Order Mark) for an explanation. You might have characters from strange languages in your file that need this encoding, or your file might have been created by some Windows software that saves files with a BOM. The BOM is placed before all other data at the beginning of a file.

    R sees these characters and believes the data start here. Try:

    (1) If you don't need this encoding, simply open your data in a text editor (like Vim), change the encoding, save, and read into R. (In Vim do :write ++enc=utf-8 new_file_name.txt, then close the file and open the newly saved version, then do :set nobomb, just to be sure, then :wq.)

    (2) If you need the encoding or don't want to go through a text editor, tell R what encoding the file is in. You might experiment with:

    read.table("file.dat", fileEncoding = "UTF-16")
    read.table("file.dat", fileEncoding = "UTF-16LE")
    read.table("file.dat", fileEncoding = "UTF-16-LE")
    read.table("file.dat", fileEncoding = "UCS-2LE")
    

    If none of these work, try the solution given in this related question: How to detect the right encoding for read.csv?, and check the R manual on R Data Import/Export, it has a section that explains about files with BOM.

提交回复
热议问题