How can I get the content of the subfield with batch script?

大憨熊 提交于 2019-12-02 07:54:13

问题


I have the following xml:

<datafield tag="007G">
    <subfield code="c">GBV</subfield>
    <subfield code="0">688845614</subfield>
  </datafield>

and I try to extract the content of the <subfield code="0" 688845614

This is my code:

@echo off
for /F "tokens=2 delims=>/<" %%i in ('findstr "007G" curlread.txt') do echo %%i
pause

but as output I only get <datafield tag="007G">

There could be many <datafield tag="007G"> in the xml doc and I need to get <subfield code="0" from every of it.


回答1:


It's always better to parse structured markup language as hierarchical data, rather than as flat text to scrape.

To return the data from only the first <subfield code="0"> node, replace your findstr command as follows:

powershell "([xml](gc curlread.txt)).selectSingleNode('//subfield[@code=0]/text()').data"

If you will have multiple <subfield code="0"> nodes and you want the data from all of them, then

powershell "([xml](gc curlread.txt)).selectNodes('//subfield[@code=0]/text()') | %%{ $_.data }"

XPath for the win. You can also specify only <subfield code="0"> nodes that are children of <datafield tag="007G"> by modifying the XPath selector like this:

//datafield[@tag=\"007G\"]/subfield[@code=0]/text()

Important: Quotation marks in the XPath must be backslash escaped.


Edit: Given the XML you pasted in your comment below:

<datafield tag="007G">
    <subfield code="c">GBV</subfield>
    <subfield code="0">688845614</subfield>
</datafield>
<datafield tag="008G">
    <subfield code="c">GBV</subfield>
    <subfield code="0">68614</subfield>
</datafield>

... be advised that that is not fully valid XML. Valid XML has a single hierarchical root. Before your data can be parsed, you'll have to enclose it with a root tag.

Here's an example of how to do that:

@echo off & setlocal

set "xml=curlread.xml"
rem // Note that quotation marks in the XPath must be backslash escaped
set "xpath=//datafield[@tag=\"007G\"]/subfield[@code=0]/text()"

for /f "delims=" %%I in (
    'powershell "([xml]('<r>{0}</r>' -f (gc %xml%))).selectNodes('%xpath%') | %%{$_.data}"'
) do (
    set "subfield=%%I"

    setlocal enabledelayedexpansion
    echo something useful with !subfield!
    endlocal
)
pause
goto :EOF


来源:https://stackoverflow.com/questions/41106654/how-can-i-get-the-content-of-the-subfield-with-batch-script

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!