Accented/umlauted characters in C?

断了今生、忘了曾经 提交于 2019-12-23 13:58:10

问题


I'm just learning about C and got an assignment where we have to translate plain text into morse code and back. (I am mostly familiar with Java so bear with me on the terms I use).

To do this, I have an array with the strings for all letters.

char *letters[] = {
".- ", "-... ", "-.-. ", "-.. ", ".", "..-." etc

I wrote a function for returning the position of the desired letter.

int letter_nr(unsigned char c)
{
    return c-97;
}

This is working, but the assignment specifications require the handling of the Swedish umlauted letters åäö. The Swedish alphabet is the same as the English with these three letters in the end. I tried checking for these, like so:

int letter_nr(unsigned char c)
{
    if (c == 'å')
        return 26;
    if (c == 'ä')
        return 27;
    if (c == 'ö')
        return 28;
    return c-97;
}

Unfortunately, when I tried testing this function, I get the same value for all of these three: 98. Here is my main, testing function:

int main()
{   
    unsigned char letter;

    while(1)
    {
        printf("Type a letter to get its position: ");
        scanf("%c", &letter);
        printf("%d\n", letter_nr(letter));
    }
    return 0;
}

What can I do to resolve this?


回答1:


In general encoding stuff is quite complicated. On the other hand if you just want a dirty solution specific to your compiler/platform than add something like this to your code:

printf("letter 0x%x is number %d\n", letter, letter_nr(letter));

It will give hex value for your umlauts. Than just replace in if statements your letter with number.

EDIT You say that you are always getting 98 so your scanf got 98 + 97 = 195 = 0x3C from console. According to this table 0x3C is start of UTF8 sequence for common LATIN SMALL LETTER N WITH Something in Latin1 block. You are on Mac OS X ?

EDIT This is my final call. Quite hackery but it works for me :)

#include <stdio.h>

// scanf for for letter. Return position in Morse Table. 
// Recognises UTF8 for swedish letters.
int letter_nr()
{
  unsigned char letter;
  // scan for the first time,
  scanf("%c", &letter);
  if(0xC3 == letter)
  {
    // we scanf again since this is UTF8 and two byte encoded character will come
    scanf("%c", &letter);
    //LATIN SMALL LETTER A WITH RING ABOVE = å
    if(0xA5 == letter)
      return 26;
    //LATIN SMALL LETTER A WITH DIAERESIS = ä
    if(0xA4 == letter)
      return 27;
   // LATIN SMALL LETTER O WITH DIAERESIS = ö
    if(0xB6 == letter)
      return 28;

    printf("Unknown letter. 0x%x. ", letter);
    return -1;
  } 
  // is seems to be regular ASCII
  return letter - 97;
 } // letter_nr

int main()
{   
    while(1)
    {
        printf("Type a letter to get its position: ");

        int val = letter_nr();
        if(-1 != val)
          printf("Morse code is %d.\n", val);
        else
          printf("Unknown Morse code.\n");

        // strip remaining new line
    unsigned char new_line;
    scanf("%c", &new_line);         
    }
    return 0;
}



回答2:


The encoding of character constants actually depend on your locale settings.

The safest bet is to use wide characters, and the corresponding functions. You declare the alphabet as const wchar_t* alphabet = L"abcdefghijklmnopqrstuvwxyzäöå", and the individual characters as L'ö';

This small example program works for me (also on a UNIX console with UTF-8) - try it.

#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main(int argc, char** argv)
{
    wint_t letter = L'\0';
    setlocale(LC_ALL, ""); /* Initialize locale, to get the correct conversion to/from wchars */
    while(1)
    {
        if(!letter)
            printf("Type a letter to get its position: ");

        letter = fgetwc(stdin);
        if(letter == WEOF) {
        putchar('\n');
        return 0;
        } else if(letter == L'\n' || letter == L'\r') { 
        letter = L'\0'; /* skip newlines - and print the instruction again*/
        } else {
        printf("%d\n", letter); /* print the character value, and don't print the instruction again */
        }
    }
    return 0;
}

Example session:

Type a letter to get its position: a
97
Type a letter to get its position: A
65
Type a letter to get its position: Ö
214
Type a letter to get its position: ö
246
Type a letter to get its position: Å
197
Type a letter to get its position: <^D>

I understand that on Windows, this does not work with characters outside the Unicode BMP, but that's not an issue here.




回答3:


Hmmm ... at first I'd say the "funny" characters are not chars. You cannot pass one of them to a function accepting a char argument and expect it to work.

Try this (add the remaining bits):

char buf[100];
printf("Enter a string with funny characters: ");
fflush(stdout);
fgets(buf, sizeof buf, stdin);
/* now print it, as if it was a sequence of `char`s */
char *p = buf;
while (*p) {
    printf("The character '%c' has value %d\n", *p, *p);
    p++;
}

Now try the same with wide characters: #include <wchar.h> and replace printf with wprintf, fgets with fgetws, etc ...



来源:https://stackoverflow.com/questions/1725124/accented-umlauted-characters-in-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!