Currently I am trying to read a UTF-16 encoded CSV file char by char, and convert each char into ascii so I can process it. I later plan to change my processed data back to UTF-16 but that is besides the point right now.
I know right off the bat I am doing this completely wrong, as I have never attempted anything like this before:
int main(void)
{
FILE *fp;
int ch;
if(!(fp = fopen("x.csv", "r"))) return 1;
while(ch != EOF)
{
ch = fgetc(fp);
ch = (wchar_t) ch;
ch = (char) ch;
printf("%c", ch);
}
fclose(fp);
return 0;
}
Wishfully thinking, I was hoping that that work by magic for some reason but that was not the case. How can I read a UTF-16 CSV file and convert it to ascii? My guess is since each utf-16 char is two bytes (i think?) I'm going to have to read two bytes at a time from the file into a variable of some datatype which I am not sure of. Then I guess I will have to check the bits of this variable to make sure it is valid ascii and convert it from there? I don't know how I would do this though and any help would be great.
You should use fgetwc
. The below code should work in the presence of a byte-order mark, and an available locale named en_US.UTF-16
.
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
main() {
setlocale(LC_ALL, "en_US.UTF-16");
FILE *fp = fopen("x.csv", "rb");
if (fp) {
int order = fgetc(fp) == 0xFE;
order = fgetc(fp) == 0xFF;
wint_t ch;
while ((ch = fgetwc(fp)) != WEOF) {
putchar(order ? ch >> 8 : ch);
}
putchar('\n');
fclose(fp);
return 0;
} else {
perror("opening x.csv");
return 1;
}
}
This is my solution thanks to the comments under my original question. Since every character in the CSV file is valid ascii the solution was simple as this:
int main(void)
{
FILE *fp;
int ch, i = 1;
if(!(fp = fopen("x.csv", "r"))) return 1;
while(ch != EOF)
{
ch = fgetc(fp);
if(i % 2) //ch is valid ascii
i++;
}
fclose(fp);
return 0;
}
来源:https://stackoverflow.com/questions/12125659/reading-a-utf-16-csv-file-by-char