I want to delete all the control characters from my file using linux bash commands.
There are some control characters like EOF (0x1A) especially which are causing th
A little late to the party: cat -v <file>
which I think is the easiest to remember of the lot!
Based on this answer on unix.stackexchange, this should do the trick:
$ cat scriptfile.raw | col -b > scriptfile.clean
Try grep
, like:
grep -o "[[:print:][:space:]]*" in.txt > out.txt
which will print only alphanumeric characters including punctuation characters and space characters such as tab, newline, vertical tab, form feed, carriage return, and space.
To be less restrictive, and remove only control characters ([:cntrl:]
), delete them by:
tr -d "[:cntrl:]"
If you want to keep \n
(which is part of [:cntrl:]
), then replace it temporarily to something else, e.g.
cat file.txt | tr '\r\n' '\275\276' | tr -d "[:cntrl:]" | tr "\275\276" "\r\n"
Instead of using the predefined [:cntrl:]
set, which as you observed includes \n
and \r
, just list (in octal) the control characters you want to get rid of:
$ tr -d '\000-\011\013\014\016-\037' < file.txt > newfile.txt