Best way to change words into numbers using specific word list

前端未结

关注

 3  751

广开言路 2021-01-14 20:58

I have a text file that contains tweets per line, that need to be altered for a machine learning format. Im using python and basic unix text manipulation (regex) to achieve

3条回答

暗喜 (楼主)

2021-01-14 21:46

In awk:

awk '
NR==FNR {
    a[$1];
    next
    }

{ 
    gsub(/!/, "", $0)  # This will ignore `!`. Other rules can be added.
    for (i=1;i<=NF;i++) {
        if ($i in a) {
        printf "1 "
        }
    else {
        printf "0 "
        }
    }
    print ""
}' lookup tweets

Test: (You'll probably need to alter `gsub` line to handle special cases.)

[jaypal:~/Temp] cat lookup
:)
cool
happy
fun

[jaypal:~/Temp] cat tweets
this has been a fun day :)
i find python cool! it makes me happy

[jaypal:~/Temp] awk '
NR==FNR {
    a[$1];
    next
    }

{ 
    gsub(/!/, "", $0)
    for (i=1;i<=NF;i++) {
        if ($i in a) {
        printf "1 "
        }
    else {
        printf "0 "
        }
    }
    print ""
}' lookup tweets
0 0 0 0 1 0 1
0 0 0 1 0 0 0 1

0 讨论(0)

查看其它3个回答

Best way to change words into numbers using specific word list

Test: (You'll probably need to alter gsub line to handle special cases.)

Test: (You'll probably need to alter `gsub` line to handle special cases.)