Removing Control Characters from a File

后端 未结 4 1824
野的像风
野的像风 2020-12-03 05:20

I want to delete all the control characters from my file using linux bash commands.

There are some control characters like EOF (0x1A) especially which are causing th

相关标签:
4条回答
  • 2020-12-03 05:25

    A little late to the party: cat -v <file> which I think is the easiest to remember of the lot!

    0 讨论(0)
  • 2020-12-03 05:32

    Based on this answer on unix.stackexchange, this should do the trick:

    $ cat scriptfile.raw | col -b > scriptfile.clean
    
    0 讨论(0)
  • 2020-12-03 05:32

    Try grep, like:

    grep -o "[[:print:][:space:]]*" in.txt > out.txt
    

    which will print only alphanumeric characters including punctuation characters and space characters such as tab, newline, vertical tab, form feed, carriage return, and space.

    To be less restrictive, and remove only control characters ([:cntrl:]), delete them by:

    tr -d "[:cntrl:]"
    

    If you want to keep \n (which is part of [:cntrl:]), then replace it temporarily to something else, e.g.

    cat file.txt | tr '\r\n' '\275\276' | tr -d "[:cntrl:]" | tr "\275\276" "\r\n"
    
    0 讨论(0)
  • 2020-12-03 05:35

    Instead of using the predefined [:cntrl:] set, which as you observed includes \n and \r, just list (in octal) the control characters you want to get rid of:

    $ tr -d '\000-\011\013\014\016-\037' < file.txt > newfile.txt
    
    0 讨论(0)
提交回复
热议问题