There are many ways to represent the +1 million UTF-8 characters. Take the latin capital \"A\" with macron (Ā). This is unicode code point U+0100,
You can leverage iconv (or a few other functions) to convert a code point number to a UTF-8 string:
function unichr($i)
{
return iconv('UCS-4LE', 'UTF-8', pack('V', $i));
}
$codeunits = array();
for ($i = 0; $i<0xD800; $i++)
$codeunits[] = unichr($i);
for ($i = 0xE000; $i<0xFFFF; $i++)
$codeunits[] = unichr($i);
$all = implode($codeunits);
(I avoided the surrogate range 0xD800–0xDFFF as they aren't valid to put in UTF-8 themselves; that would be “CESU-8”.)