Strange unicode characters when reading in file in node.js app

后端未结

关注

 4  1491

I am attempting to write a node app that reads in a set of files, splits them into lines, and puts the lines into an array. Pretty simple. It works on quite a few files exce

相关标签:

4条回答

走了就别回头了

2020-12-19 02:04
I did the following in Windows command prompt to convert the endianness:
```
type file.txt > file2.txt
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
野趣味

2020-12-19 02:06
Your file is in UTF-16 Little ~~Big~~ Endian, not UTF-8.
```
var data = fs.readFileSync("test.sql", "utf16le"); //Not sure if this eats the BOM
```
Unfortunately node.js only supports UTF-16 Little Endian or UTF-16LE (Can't be sure from reading docs, there is a slight difference between them; namely that UTF-16LE does not use BOMs), so you have to use iconv or convert the file to UTF-8 some other way.

Example:
```
var Iconv  = require('iconv').Iconv,
    fs = require("fs");

var buffer = fs.readFileSync("test.sql"),
    iconv = new Iconv( "UTF-16", "UTF-8");

var result = iconv.convert(buffer).toString("utf8");
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
粉色の甜心

2020-12-19 02:09

Is this perhaps the BOM (Byte-Order-Mark)? Make sure you save your files without the BOM or include code to strip the BOM.

The BOM is usually invisible in text editors.

I know Notepad++ has a feature where you can easily strip a BOM from a file. Encoding > Encode in UTF-8 without BOM.

0 讨论(0)
发布评论:

提交评论
- 加载中...

刺人心

2020-12-19 02:16

Use the lite version of Iconv-lite

var result= "";
var iconv = require('iconv-lite');
var stream = fs.createReadStream(sourcefile)
    .on("error",function(err){
        //handle error
    })
    .pipe(iconv.decodeStream('win1251'))
    .on("error",function(err){
        //handle error
    })
    .on("data",function(data){
        result += data;
    })
    .on("end",function(){
       //use result
    });

0 讨论(0)