Extracting data between two tags in HTML file
I've got a HUUUGE HTML file here saved on my system, which contains data from a product catalogue. The data is structured such that for each product record the name is between two tags (name) and (/name) . Each product has up to 3 attributes: name, productID, and color, but not all products will have all these attributes. How would I go about extracting this data for each product without mixing up the product attributes? The file is also 50 megabyte! Code example .... <name>'hat'</name> blah blah blah <prodId>'1829493'</prodId> blah blah blah <color>'cyan'</color> blah blah blah blah blah blah