Read a file by bytes in BASH

后端 未结 7 2013
囚心锁ツ
囚心锁ツ 2020-12-06 00:19

I need to read first byte of file I specified, then second byte,third and so on. How could I do it on BASH? P.S I need to get HEX of this bytes

相关标签:
7条回答
  • 2020-12-06 00:59

    Did you try xxd? It gives hex dump directly, as you want..

    For your case, the command would be:

    xxd -c 1 /path/to/input_file | while read offset hex char; do
      #Do something with $hex
    done
    

    Note: extract the char from hex, rather than while read line. This is required because read will not capture white space properly.

    0 讨论(0)
  • 2020-12-06 01:03

    Although I rather wanted to expand Perleone's own post (as it was his basic concept!), my edit was rejected after all, and I was kindly adviced that this should be posted as a separate answer. Fair enough, so I will do that.

    Considerations in short for the improvements on Perleone's original script:

    • seq would be totally overkill here. A simple while loop with a used as a (likewise simple) counter variable will do the job just fine (and much quicker too)
    • The max value, $(cat $1 | wc -c) must be assigned to a variable, otherwise it will be recalculated every time and make this alternate script run even slower than the one it was derived from.
    • There's no need to waste a function on a simple usage info line. However, it is necessary to know about the (mandatory) curly braces around two commands, for without the { }, the exit 1 command will be executed in either case, and the script interpreter will never make it to the loop. (Last note: ( ) will work too, but not in the same way! Parentheses will spawn a subshell, whilst curly braces will execute commands inside them in the current shell.)
    #!/bin/bash
    
    test -s "$1" || { echo "Need a file with size greater than 0!"; exit 1; }
    
    a=0
    max=$(cat $1 | wc -c)
    while [[ $((++a)) -lt $max ]]; do
      cat $1 | head -c$a | tail -c1 | \
      xargs -0 -I{} printf '%c %#02x\n' {} "'{}"
    done
    
    0 讨论(0)
  • 2020-12-06 01:04

    I have a suggestion to give, but would like a feedback from everybody and manly a personal advice from syntaxerror's user.

    I don't know much about bash but I thought maybe it would be better to have "cat $1" stored in a variable.. but the problem is that echo command will also bring a small overhead right?

    test -s "$1" || (echo "Need a file with size greater than 0!"; exit 1)
    a=0
    rfile=$(cat $1)
    max=$(echo $rfile | wc -c)
    while [[ $((++a)) -lt $max ]]; do
      echo $rfile | head -c$a | tail -c1 | \
      xargs -0 -I{} printf '%c %#02x\n' {} "'{}"
    done
    

    in my opinion it would have a better performance but i haven't perf'tested..

    0 讨论(0)
  • 2020-12-06 01:10

    Full rewrite: september 2019!

    A lot shorter and simplier than previous versions! (Something faster, but not so much)

    Yes , bash can read and write binary:

    Syntax:

    LANG=C IFS= read -r -d '' -n 1 foo
    

    will populate $foo with 1 binary byte. Unfortunately, as bash strings cannot hold null bytes ($\0), reading one byte once is required.

    But for the value of byte read, I've missed this in man bash (have a look at 2016 post, at bottom of this):

     printf [-v var] format [arguments]
     ...
         Arguments to non-string format specifiers are treated as C constants,
         except that ..., and if  the leading character is a  single or double
         quote, the value is the ASCII value of the following character.
    

    So:

    read8() {
        local _r8_var=${1:-OUTBIN} _r8_car LANG=C IFS=
        read -r -d '' -n 1 _r8_car
        printf -v $_r8_var %d \'$_r8_car
    }
    

    Will populate submitted variable name (default to $OUTBIN) with decimal ascii value of first byte from STDIN

    read16() {
        local _r16_var=${1:-OUTBIN} _r16_lb _r16_hb
        read8 _r16_lb &&
        read8 _r16_hb
        printf -v $_r16_var %d $(( _r16_hb<<8 | _r16_lb ))
    }
    

    Will populate submitted variable name (default to $OUTBIN) with decimal value of first 16 bits word from STDIN...

    Of course, for switching Endianness, you have to switch:

        read8 _r16_hb &&
        read8 _r16_lb
    

    And so on:

    # Usage:
    #       read[8|16|32|64] [varname] < binaryStdInput
    
    read8() {  local _r8_var=${1:-OUTBIN} _r8_car LANG=C IFS=
        read -r -d '' -n 1 _r8_car
        printf -v $_r8_var %d \'$_r8_car ;}
    read16() { local _r16_var=${1:-OUTBIN} _r16_lb _r16_hb
        read8  _r16_lb && read8  _r16_hb
        printf -v $_r16_var %d $(( _r16_hb<<8 | _r16_lb )) ;}
    read32() { local _r32_var=${1:-OUTBIN} _r32_lw _r32_hw
        read16 _r32_lw && read16 _r32_hw
        printf -v $_r32_var %d $(( _r32_hw<<16| _r32_lw )) ;}
    read64() { local _r64_var=${1:-OUTBIN} _r64_ll _r64_hl
        read32 _r64_ll && read32 _r64_hl
        printf -v $_r64_var %d $(( _r64_hl<<32| _r64_ll )) ;}
    

    So you could source this, then if your /dev/sda is gpt partitioned,

    read totsize < <(blockdev --getsz /dev/sda)
    read64 gptbackup < <(dd if=/dev/sda bs=8 skip=68 count=1 2>/dev/null)
    echo $((totsize-gptbackup))
    1
    

    Answer could be 1 (1st GPT is at sector 1, one sector is 512 bytes. GPT Backup location is at byte 32. With bs=8 512 -> 64 + 32 -> 4 = 544 -> 68 blocks to skip... See GUID Partition Table at Wikipedia).

    Quick small write function...

    write () { 
        local i=$((${2:-64}/8)) o= v r
        r=$((i-1))
        for ((;i--;)) {
            printf -vv '\%03o' $(( ($1>>8*(0${3+-1}?i:r-i))&255 ))
            o+=$v
        }
        printf "$o"
    }
    

    This function default to 64 bits, little endian.

    Usage: write <integer> [bits:64|32|16|8] [switchto big endian]
    
    • With two parameter, second parameter must be one of 8, 16, 32 or 64, to be bit length of generated output.
    • With any dummy 3th parameter, (even empty string), function will switch to big endian.

    .

    read64 foo < <(write -12345);echo $foo
    -12345
    

    ...

    First post 2015...

    Upgrade for adding specific bash version (with bashisms)

    With new version of printf built-in, you could do a lot without having to fork ($(...)) making so your script a lot faster.

    First let see (by using seq and sed) how to parse hd output:

    echo ;sed <(seq -f %02g 0 $(( COLUMNS-1 )) ) -ne '
        /0$/{s/^\(.*\)0$/\o0337\o033[A\1\o03380/;H;};
        /[1-9]$/{s/^.*\(.\)/\1/;H};
        ${x;s/\n//g;p}';hd < <(echo Hello good world!)
    0         1         2         3         4         5         6         7
    012345678901234567890123456789012345678901234567890123456789012345678901234567
    00000000  48 65 6c 6c 6f 20 67 6f  6f 64 20 77 6f 72 6c 64  |Hello good world|
    00000010  21 0a                                             |!.|
    00000012
    

    Were hexadecimal part begin at col 10 and end at col 56, spaced by 3 chars and having one extra space at col 34.

    So parsing this could by done by:

    while read line ;do
        for x in ${line:10:48};do
            printf -v x \\%o 0x$x
            printf $x
          done
      done < <( ls -l --color | hd )
    

    Old original post

    Edit 2 for Hexadecimal, you could use hd

    echo Hello world | hd
    00000000  48 65 6c 6c 6f 20 77 6f  72 6c 64 0a              |Hello world.|
    

    or od

    echo Hello world | od -t x1 -t c
    0000000  48  65  6c  6c  6f  20  77  6f  72  6c  64  0a
              H   e   l   l   o       w   o   r   l   d  \n
    

    shortly

    while IFS= read -r -n1 car;do [ "$car" ] && echo -n "$car" || echo ; done
    

    try them:

    while IFS= read -rn1 c;do [ "$c" ]&&echo -n "$c"||echo;done < <(ls -l --color)
    

    Explain:

    while IFS= read -rn1 car  # unset InputFieldSeparator so read every chars
        do [ "$car" ] &&      # Test if there is ``something''?
            echo -n "$car" || # then echo them
            echo              # Else, there is an end-of-line, so print one
      done
    

    Edit; Question was edited: need hex values!?

    od -An -t x1 | while read line;do for char in $line;do echo $char;done ;done
    

    Demo:

    od -An -t x1 < <(ls -l --color ) |        # Translate binary to 1 byte hex 
        while read line;do                    # Read line of HEX pairs
            for char in $line;do              # For each pair
                printf "\x$char"              # Print translate HEX to binary
          done
      done
    

    Demo 2: We have both hex and binary

    od -An -t x1 < <(ls -l --color ) |        # Translate binary to 1 byte hex 
        while read line;do                    # Read line of HEX pairs
            for char in $line;do              # For each pair
                bin="$(printf "\x$char")"     # translate HEX to binary
                dec=$(printf "%d" 0x$char)    # translate to decimal
                [ $dec -lt 32  ] ||           # if caracter not printable
                ( [ $dec -gt 128 ] &&         # change bin to a single dot.
                  [ $dec -lt 160 ] ) && bin="."
                str="$str$bin" 
                echo -n $char \               # Print HEX value and a space
                ((i++))                       # count printed values
                if [ $i -gt 15 ] ;then
                    i=0
                    echo "  -  $str"
                    str=""
                  fi
          done
      done
    

    New post on september 2016:

    This could be usefull on very specific cases, ( I've used them to manualy copy GPT partitions between two disk, at low level, without having /usr mounted...)

    Yes, bash could read binary!

    ... but only one byte, by one... (because `char(0)' couldn't be correctly read, the only way of reading them correctly is to consider end-of-file, where if no caracter is read and end of file not reached, then character read is a char(0)).

    This is more a proof of concept than a relly usefull tool: there is a pure bash version of hd (hexdump).

    This use recent bashisms, under bash v4.3 or higher.

    #!/bin/bash
    
    printf -v ascii \\%o {32..126}
    printf -v ascii "$ascii"
    
    printf -v cntrl %-20sE abtnvfr
    
    values=()
    todisplay=
    address=0
    printf -v fmt8 %8s
    fmt8=${fmt8// / %02x}
    
    while LANG=C IFS= read -r -d '' -n 1 char ;do
        if [ "$char" ] ;then
            printf -v char "%q" "$char"
            ((${#char}==1)) && todisplay+=$char || todisplay+=.
            case ${#char} in
             1|2 ) char=${ascii%$char*};values+=($((${#char}+32)));;
               7 ) char=${char#*\'\\};values+=($((8#${char%\'})));;
               5 ) char=${char#*\'\\};char=${cntrl%${char%\'}*};
                    values+=($((${#char}+7)));;
               * ) echo >&2 ERROR: $char;;
            esac
          else
            values+=(0)
          fi
    

        if [ ${#values[@]} -gt 15 ] ;then
            printf "%08x $fmt8 $fmt8  |%s|\n" $address ${values[@]} "$todisplay"
            ((address+=16))
            values=() todisplay=
          fi
      done
    
    if [ "$values" ] ;then
            ((${#values[@]}>8))&&fmt="$fmt8 ${fmt8:0:(${#values[@]}%8)*5}"||
                fmt="${fmt8:0:${#values[@]}*5}"
            printf "%08x $fmt%$((
                    50-${#values[@]}*3-(${#values[@]}>8?1:0)
                ))s |%s|\n" $address ${values[@]} ''""'' "$todisplay"
    fi
    printf "%08x (%d chars read.)\n" $((address+${#values[@]})){,}
    

    You could try/use this, but don't try to compare performances!

    time hd < <(seq 1 10000|gzip)|wc
       1415   25480  111711
    real    0m0.020s
    user    0m0.008s
    sys     0m0.000s
    
    time ./hex.sh < <(seq 1 10000|gzip)|wc
       1415   25452  111669
    real    0m2.636s
    user    0m2.496s
    sys     0m0.048s
    

    same job: 20ms for hd vs 2000ms for my bash script.

    ... but if you wanna read 4 bytes in a file header or even a sector address in an hard drive, this could do the job...

    0 讨论(0)
  • 2020-12-06 01:10

    use read with -n option.

    while read -n 1 ch; do
      echo $ch
    done < moemoe.txt
    
    0 讨论(0)
  • 2020-12-06 01:12

    using read a single char can be read at a time as follows:

    read -n 1 c
    echo $c   
    

    [ANSWER]

    Try this:

    #!/bin/bash
    # data file
    INPUT=/path/to/input.txt
    
    # while loop
    while IFS= read -r -n1 char
    do
            # display one character at a time
        echo  "$char"
    done < "$INPUT"
    

    From this link


    Second method, Using awk, loop through char by char

    awk '{for(i=1;i<=length;i++) print substr($0, i, 1)}' /home/cscape/Desktop/table2.sql


    third way,

    $ fold -1 /home/cscape/Desktop/table.sql  | awk '{print $0}'
    

    EDIT: To print each char as HEX number:

    Suppose I have a file name file :

    $ cat file
    123A3445F 
    

    I have written a awk script (named x.awk) to that read char by char from file and print into HEX :

    $ cat x.awk
    #!/bin/awk -f
    
    BEGIN    { _ord_init() }
    
    function _ord_init(    low, high, i, t)
    {
        low = sprintf("%c", 7) # BEL is ascii 7
        if (low == "\a") {    # regular ascii
            low = 0
            high = 127
        } else if (sprintf("%c", 128 + 7) == "\a") {
            # ascii, mark parity
            low = 128
            high = 255
        } else {        # ebcdic(!)
            low = 0
            high = 255
        }
    
        for (i = low; i <= high; i++) {
            t = sprintf("%c", i)
            _ord_[t] = i
        }
    }
    function ord(str,    c)
    {
        # only first character is of interest
        c = substr(str, 1, 1)
        return _ord_[c]
    }
    
    function chr(c)
    {
        # force c to be numeric by adding 0
        return sprintf("%c", c + 0)
    }
    
    { x=$0; printf("%s , %x\n",$0, ord(x) )} 
    

    To write this script I used awk-documentation
    Now, You can use this awk script for your work as follows:

    $ fold -1 /home/cscape/Desktop/file  | awk -f x.awk
    1 , 31
    2 , 32
    3 , 33
    A , 41
    3 , 33
    4 , 34
    4 , 34
    5 , 35
    F , 46
    

    NOTE: A value is 41 in HEX decimal. To print in decimal change %x to %d in last line of script x.awk.

    Give it a Try!!

    0 讨论(0)
提交回复
热议问题