Strange unicode characters when reading in file in node.js app

后端 未结 4 1491
青春惊慌失措
青春惊慌失措 2020-12-19 01:30

I am attempting to write a node app that reads in a set of files, splits them into lines, and puts the lines into an array. Pretty simple. It works on quite a few files exce

相关标签:
4条回答
  • 2020-12-19 02:04

    I did the following in Windows command prompt to convert the endianness:

    type file.txt > file2.txt
    
    0 讨论(0)
  • 2020-12-19 02:06

    Your file is in UTF-16 Little Big Endian, not UTF-8.

    var data = fs.readFileSync("test.sql", "utf16le"); //Not sure if this eats the BOM
    

    Unfortunately node.js only supports UTF-16 Little Endian or UTF-16LE (Can't be sure from reading docs, there is a slight difference between them; namely that UTF-16LE does not use BOMs), so you have to use iconv or convert the file to UTF-8 some other way.

    Example:

    var Iconv  = require('iconv').Iconv,
        fs = require("fs");
    
    var buffer = fs.readFileSync("test.sql"),
        iconv = new Iconv( "UTF-16", "UTF-8");
    
    var result = iconv.convert(buffer).toString("utf8");
    
    0 讨论(0)
  • 2020-12-19 02:09

    Is this perhaps the BOM (Byte-Order-Mark)? Make sure you save your files without the BOM or include code to strip the BOM.

    The BOM is usually invisible in text editors.

    I know Notepad++ has a feature where you can easily strip a BOM from a file. Encoding > Encode in UTF-8 without BOM.

    0 讨论(0)
  • 2020-12-19 02:16

    Use the lite version of Iconv-lite

    var result= "";
    var iconv = require('iconv-lite');
    var stream = fs.createReadStream(sourcefile)
        .on("error",function(err){
            //handle error
        })
        .pipe(iconv.decodeStream('win1251'))
        .on("error",function(err){
            //handle error
        })
        .on("data",function(data){
            result += data;
        })
        .on("end",function(){
           //use result
        });
    
    0 讨论(0)
提交回复
热议问题