Replace tabs (“\t”) in flat file with “Unit Separator” (0x1f) in C#

自古美人都是妖i 提交于 2019-12-08 03:28:28

问题


I have been having trouble finding the metacharacter for the 'Unit Separator' to replace the tabs in a flat file.

So far I have this:

File.WriteAllLines(outputFile,
    File.ReadLines(inputFile)
    .Select(t => t.Replace("\t", "\0x1f")));  //this does not work

I have also tried:

File.WriteAllLines(outputFile,
    File.ReadLines(inputFile)
    .Select(t => t.Replace("\t", "\u"))); //also doesn't work

AND

File.WriteAllLines(outputFile,
    File.ReadLines(inputFile)
    .Select(t => t.Replace("\t", 0x1f)));  //also doesn't work

How do I correctly use hex as a parameter? Also, what is the metacharacter for the 'Unit Separator"?


回答1:


the metacharacter for the unit separator is

U+001f

you should be able to use it like

File.WriteAllLines(outputFile,
File.ReadLines(inputFile)
.Select(t => t.Replace("\t", "\u001f")));

EDIT: Since a discussion about control characters started to happen, Ill add this definition for posterity's sake.

A special, non-printing character that begins, modifies, or ends a function, event, operation or control operation. The ASCII character set defines 32 control characters. Originally, these codes were designed to control teletype machines. Now, however, they are often used to control display monitors, printers, and other modern devices.

from here.

also, here is a description of the unit separator

The smallest data items to be stored in a database are called units in the ASCII definition. We would call them field now. The unit separator separates these fields in a serial data storage environment. Most current database implementations require that fields of most types have a fixed length. Enough space in the record is allocated to store the largest possible member of each field, even if this is not necessary in most cases. This costs a large amount of space in many situations. The US control code allows all fields to have a variable length. If data storage space is limited—as in the sixties—this is a good way to preserve valuable space. On the other hand is serial storage far less efficient than the table driven RAM and disk implementations of modern times. I can't imagine a situation where modern SQL databases are run with the data stored on paper tape or magnetic reels...

from here.




回答2:


This should get you where you need to be:

        char unitSeperatorChar = (char)Convert.ToInt32("0x1f", 16);
        string contents = File.ReadAllText(inputFile);
        string convertedContents = contents.Replace('\t', unitSeperatorChar);
        File.WriteAllText(outputFile, convertedContents);

I loaded into a string, converted, and re-saved. You can combine them for better memory efficiency in string management.




回答3:


I think the correct way to encode unicode characters in C# is to use the \unnnn format. You can try replacing it with the string \u001f, like so:

File.WriteAllLines(outputFile,
    File.ReadLines(inputFile)
    .Select(t => t.Replace("\t", "\001f")));

Does that work?



来源:https://stackoverflow.com/questions/31947128/replace-tabs-t-in-flat-file-with-unit-separator-0x1f-in-c-sharp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!