Reading a UTF-16 CSV file by char

流过昼夜 提交于 2019-12-08 09:07:46

问题


Currently I am trying to read a UTF-16 encoded CSV file char by char, and convert each char into ascii so I can process it. I later plan to change my processed data back to UTF-16 but that is besides the point right now.

I know right off the bat I am doing this completely wrong, as I have never attempted anything like this before:

int main(void)
{
    FILE *fp;
    int ch;
    if(!(fp = fopen("x.csv", "r"))) return 1;
    while(ch != EOF)
    {
        ch = fgetc(fp);
                ch = (wchar_t) ch;
                ch = (char) ch;
        printf("%c", ch);
    }
    fclose(fp);
    return 0;
}

Wishfully thinking, I was hoping that that work by magic for some reason but that was not the case. How can I read a UTF-16 CSV file and convert it to ascii? My guess is since each utf-16 char is two bytes (i think?) I'm going to have to read two bytes at a time from the file into a variable of some datatype which I am not sure of. Then I guess I will have to check the bits of this variable to make sure it is valid ascii and convert it from there? I don't know how I would do this though and any help would be great.


回答1:


You should use fgetwc. The below code should work in the presence of a byte-order mark, and an available locale named en_US.UTF-16.

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

main() {
  setlocale(LC_ALL, "en_US.UTF-16"); 

  FILE *fp = fopen("x.csv", "rb");
  if (fp) {
    int order = fgetc(fp) == 0xFE;
    order = fgetc(fp) == 0xFF;

    wint_t ch;
    while ((ch = fgetwc(fp)) != WEOF) {
      putchar(order ? ch >> 8 : ch);
    }
    putchar('\n');

    fclose(fp);
    return 0;
  } else {
    perror("opening x.csv");
    return 1;
  }
}



回答2:


This is my solution thanks to the comments under my original question. Since every character in the CSV file is valid ascii the solution was simple as this:

int main(void)
{
    FILE *fp;
    int ch, i = 1;
    if(!(fp = fopen("x.csv", "r"))) return 1;
    while(ch != EOF)
    {
        ch = fgetc(fp);
        if(i % 2) //ch is valid ascii
        i++;
    }
    fclose(fp);

    return 0;
}


来源:https://stackoverflow.com/questions/12125659/reading-a-utf-16-csv-file-by-char

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!