How to extract characters between the delimiters using sed?

回眸只為那壹抹淺笑 提交于 2021-02-12 11:41:22

问题


I have just started learning sed. I want to extract and print the characters between the > and < delimiters. Here the text in my data file:

<span id="ctl00_ContentPlaceHolder1_lblRollNo">12029</span>

   <br /><b>Engineering & IT/Computer Science</b><br />

        <div id="ctl00_ContentPlaceHolder1_divEngITMerit">

                        <span id="ctl00_ContentPlaceHolder1_lblEngITSelListNo">3rd Provisional Selection List</span>

                <tr><td style='width: 200px' class='TblTRData'>IT/Computer Science/Software</td><td style='width: 150px'class='TblTRData'>7 (out of 471)</td><td style='width: 325px'class='TblTRData'>Selected in MS COMPUTER SCIENCE</td></tr>

                                Name:

                                <span id="ctl00_ContentPlaceHolder1_lblName">SIDRA SHAHID</span>

                                Father Name:

                                <span id="ctl00_ContentPlaceHolder1_lblFatherName">SHAHID RAFEEQ AHMAD</span>

I have written the command:

sed -n -e '/^[^>]*>\([^<]*\)<.*/s//\1/p' myfile.txt

The problem is that it is returning the text between some of the > <. For example, it prints 12029, but not Selected in Selected in MS COMPUTER SCIENCE. What am I doing wrong?


回答1:


If you need to extract only strings between tags, this means you need to delete tags leaving strings between them untouched. Right?

sed 's/<[^>]*>//g'

It substitutes (all occurrences) of tag ( "<" everything upon next ">" ) with empty string (nothing). Text will remain.




回答2:


In sed, the s command has a g flag to operate on all pattern occurrences on a same line.

s/>\([^<]*\)</\1/pg

might suffice.



来源:https://stackoverflow.com/questions/7684729/how-to-extract-characters-between-the-delimiters-using-sed

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!