How to fix UTF encoding for whitespaces?

后端 未结 3 1397
感动是毒
感动是毒 2020-12-16 14:10

In my C# code, I am extracting text from a PDF document. When I do that, I get a string that\'s in UTF-8 or Unicode encoding (I\'m not sure which). When I use Encoding

3条回答
  •  轮回少年
    2020-12-16 14:34

    In UTF8 character value c2 a0 (194 160) is defined as NO-BREAK SPACE. According to ISO/IEC 8859 this is a space that does not allow a line break to be inserted. Normally text processing software assumes that a line break can be inserted at any white space character (this is how word wrap is normally implemented). You should be able to simply do a replace in your string of the characters with a normal space to fix the problem.

提交回复
热议问题