I\'m trying to write a regular expression using the PCRE library in PHP.
I need a regex to match only &
, >
and <
cha
In the end I've opted to use the Tidy library in PHP. The code I used is shown below:
// Specify configuration
$config = array(
'input-xml' => true,
'show-warnings' => false,
'numeric-entities' => true,
'output-xml' => true);
$tidy = new tidy();
$tidy->parseFile('feed.xml', $config, 'latin1');
$tidy->cleanRepair()
This works perfectly correcting all the encoding errors and converting invalid characters to XML entities.