问题
I have a working Bash script to extract title tags. I need help with an AWK field separator for extracting meta tags from HTML, like these:
<meta name="keywords" content="key1, key2, key3">
my script works to extract title, but meta name
doesn't work.
#!/bin/bash
for LINE in `cat htmls.txt`
do
echo $LINE
awk 'BEGIN{IGNORECASE=1;FS="<title>|</title>";RS=EOF} {print $2}' $LINE |
awk '{ if (NF > 0) printf("%s\n", $0); }'
done
I guess I need a regex solution. Any ideas?
回答1:
first install xml2 e.g.
sudo apt-get install xml2
wget -q -O - http://www.latin.fm | xml2 | grep meta | awk -F/ '{print $NF}'
Output
@property=og:title
@content=Latin FM
...
回答2:
Just do this:
$ awk '/meta name/{ gsub(/.*meta name=\042|\042.*/,"");print }' file
keywords
To get from website, use wget
wget -O- -q $url | awk '/meta name/{ gsub(/.*meta name=\042|\042.*/,"");print }'
来源:https://stackoverflow.com/questions/7058567/how-can-i-extract-meta-tags-from-html-in-a-bash-awk-script