I\'m working on a feature which requires me to get the contents of a webpage, then check to see if certain text is present in that page. It\'s a backlink checking tool.
You could try using the Dom Extension to PHP. On creating a new Dom Document you can specify the encoding of the underlying document / webpage. According to This website, internally everything is done in UTF-8. You could then find the dom nodes you were interested in, and compare the Text Content of the node
If you were not using webpages, with an associated specified character encoding, I would suggest using the multibyte functions, in particular mb_detect_encoding and mb_convert_encoding