I wanted to extract the attributes form an xml using Pig Latin.
This is a sample of the xml file
There are 2 bugs in piggybank's XPath class:
The ignoreNamespace logic breaks searching for XML attributes https://issues.apache.org/jira/browse/PIG-4751
The ignoreNamepace parameter is defaulted to true and cannot be overwritten https://issues.apache.org/jira/browse/PIG-4752
Here is my workaround using XPathAll:
XPathAll(x, 'BOOK/TITLE/@test', true, false).$0 as (test:chararray)
Also if you still need to ignore namespaces:
XPathAll(x, '//*[local-name()=\'BOOK\']//*[local-name()=\'TITLE\']/@test', true, false).$0 as (test:chararray)