I\'ve got a HUUUGE HTML file here saved on my system, which contains data from a product catalogue. The data is structured such that for each product record the name is bet
There are two ways of solving this sort of problem: string manipulation with regexes (as suggested by gnovice) or parsing the file (or a mix of the two). Parsing is often best if your file is very well structured; regexes win for messy files.
Here's the parsing solution.
Start by downloading xmliotools, and calling xml_read
on your file. Your example isn't completely reproducible, so here are two different versions of the data.
Save this to test1.xml
:
'hat'
'1829493'
'cyan'
'dress'
'18'
'dark purple'
Save this to test2.xml
.
-
'hat'
'1829493'
'cyan'
-
'dress'
'18'
'dark purple'
Now compare
x1 = xml_read('test1.xml')
x2 = xml_read('test2.xml')