问题
I am trying to use BeautifulSoup4 to find and replace specific elements within an XML. More specifically, I want to find all instances of 'file_name'(in the example below the file name is 'Cyp26A1_atRA_minus_tet_plus.txt') and replace it with the full path for that document - which is saved in the 'file_name_replacement_dir' variable. My first task, the bit i'm stuck on, is to isolate the section of interest so that I can replace it using the replaceWith() method.
The XML
<ParameterGroup name="Experiment_22">
<Parameter name="Data is Row Oriented" type="bool" value="1"/>
<Parameter name="Experiment Type" type="unsignedInteger" value="0"/>
<Parameter name="File Name" type="file" value="Cyp26A1_atRA_minus_tet_plus.txt"/>
<Parameter name="First Row" type="unsignedInteger" value="1"/>
There are actually 44 experiments with 4 different file names (So 11 with file name 1, 11 with file name 2 and so on). So the above snippet of XML is repeated 44 times, just with different files stored in the "File Name" line.
My Code so far
xml_dir = 'D:\MPhil\Model_Building\Models\Retinoic_acid\[06]\RAR_Models\Model_Line_2'
xml_file_name = 'RARa_RXR_M22.cps'
xml=model_dir+'\\'+model_name
file_name_replacement_dir = D:\MPhil\Model_Building\Models\Retinoic_acid\[06]\RAR_Models
soup = BeautifulSoup(open(xml))
print soup.find_all('parametergroup name="Experiment_22"')
This however returns an empty list. I've also tried a few other functions in place of 'soup.findall()' but still haven't been able to find a handle to the filename. Does anybody know how to do what I'm trying to do?
回答1:
xml = '<ParameterGroup name="Experiment_22">\
<Parameter name="Data is Row Oriented" type="bool" value="1"/>\
<Parameter name="Experiment Type" type="unsignedInteger" value="0"/>\
<Parameter name="File Name" type="file" value="Cyp26A1_atRA_minus_tet_plus.txt"/>\
<Parameter name="First Row" type="unsignedInteger" value="1"/>\
</ParameterGroup>'
from bs4 import BeautifulSoup
import os
soup = BeautifulSoup(xml)
for tag in soup.find_all("parameter", {'name': 'File Name'}):
tag['value'] = os.path.join('new_dir', tag['value'])
print soup
- Close your XML 'ParameterGroup' tag.
- Capitalisation of tags may not
work with BeautifulSoup, try
parameterin lower case. - use
os.pathto manipulate paths so that it works cross-platforms.
回答2:
Your selector for find_all is wrong you need to separate the tag name and attribute like so:
find_all("Parameter",{'name':'File Name'})
That will get you all the file name tags directly. If you really need the parent tag then pass in "ParameterGroup" without the attribute dictionary.
Not sure if BeautifulSoup require lower casing your tags, you may have to experiment with that.
来源:https://stackoverflow.com/questions/31069377/identify-and-replace-elements-of-xml-using-beautifulsoup-in-python