How to extract data from html table in shell script?

前端 未结 6 1453
[愿得一人]
[愿得一人] 2020-11-30 11:42

I am trying to create a BASH script what would extract the data from HTML table. Below is the example of table from where I need to extract data:

6条回答
  •  -上瘾入骨i
    2020-11-30 12:13

    There are a lot of ways of doing this but here's one:

    grep '^
::g' \ -e 's:::g' \ -e 's:::g' \ -e 's:
' < $FILENAME \ | sed \ -e 's:
: :g' \ | cut -c2-

You could use more sed(1) (-e 's:^ ::') instead of the cut -c2- to remove the leading space but cut(1) doesn't get as much love as it deserves. And the backslashes are just there for formatting, you can remove them to get a one liner or leave them in and make sure that they're immediately followed by a newline.

The basic strategy is to slowly pull the HTML apart piece by piece rather than trying to do it all at once with a single incomprehensible pile of regex syntax.

Parsing HTML with a shell pipeline isn't the best idea ever but you can do it if the HTML is known to come in a very specific format. If there will be variation then you'd be better with with a real HTML parser in Perl, Ruby, Python, or even C.

提交回复
热议问题