Identify and replace elements of XML using BeautifulSoup in Python

南楼画角 提交于 2020-01-04 14:19:36

问题


I am trying to use BeautifulSoup4 to find and replace specific elements within an XML. More specifically, I want to find all instances of 'file_name'(in the example below the file name is 'Cyp26A1_atRA_minus_tet_plus.txt') and replace it with the full path for that document - which is saved in the 'file_name_replacement_dir' variable. My first task, the bit i'm stuck on, is to isolate the section of interest so that I can replace it using the replaceWith() method.

The XML

      <ParameterGroup name="Experiment_22">
        <Parameter name="Data is Row Oriented" type="bool" value="1"/>
        <Parameter name="Experiment Type" type="unsignedInteger" value="0"/>
        <Parameter name="File Name" type="file" value="Cyp26A1_atRA_minus_tet_plus.txt"/>
        <Parameter name="First Row" type="unsignedInteger" value="1"/>

There are actually 44 experiments with 4 different file names (So 11 with file name 1, 11 with file name 2 and so on). So the above snippet of XML is repeated 44 times, just with different files stored in the "File Name" line.

My Code so far

xml_dir = 'D:\MPhil\Model_Building\Models\Retinoic_acid\[06]\RAR_Models\Model_Line_2'
xml_file_name = 'RARa_RXR_M22.cps'
xml=model_dir+'\\'+model_name
file_name_replacement_dir = D:\MPhil\Model_Building\Models\Retinoic_acid\[06]\RAR_Models
soup = BeautifulSoup(open(xml))
print soup.find_all('parametergroup name="Experiment_22"')

This however returns an empty list. I've also tried a few other functions in place of 'soup.findall()' but still haven't been able to find a handle to the filename. Does anybody know how to do what I'm trying to do?


回答1:


xml = '<ParameterGroup name="Experiment_22">\
<Parameter name="Data is Row Oriented" type="bool" value="1"/>\
<Parameter name="Experiment Type" type="unsignedInteger" value="0"/>\
<Parameter name="File Name" type="file" value="Cyp26A1_atRA_minus_tet_plus.txt"/>\
<Parameter name="First Row" type="unsignedInteger" value="1"/>\
</ParameterGroup>'

from bs4 import BeautifulSoup
import os
soup = BeautifulSoup(xml)

for tag in soup.find_all("parameter", {'name': 'File Name'}):
    tag['value'] = os.path.join('new_dir', tag['value'])

print soup
  • Close your XML 'ParameterGroup' tag.
  • Capitalisation of tags may not work with BeautifulSoup, try parameter in lower case.
  • use os.path to manipulate paths so that it works cross-platforms.



回答2:


Your selector for find_all is wrong you need to separate the tag name and attribute like so:

find_all("Parameter",{'name':'File Name'})

That will get you all the file name tags directly. If you really need the parent tag then pass in "ParameterGroup" without the attribute dictionary.

Not sure if BeautifulSoup require lower casing your tags, you may have to experiment with that.



来源:https://stackoverflow.com/questions/31069377/identify-and-replace-elements-of-xml-using-beautifulsoup-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!