I\'m reading out lots of texts from various RSS feeds and inserting them into my database.
Of course, there are several different character encodings used in the fee
Your encoding looks like you encoded into UTF-8 twice; that is, from some other encoding, into UTF-8, and again into UTF-8. As if you had ISO 8859-1, converted from ISO 8859-1 to UTF-8, and treated the new string as ISO 8859-1 for another conversion into UTF-8.
Here's some pseudocode of what you did:
$inputstring = getFromUser();
$utf8string = iconv($current_encoding, 'utf-8', $inputstring);
$flawedstring = iconv($current_encoding, 'utf-8', $utf8string);
You should try:
mb_detect_encoding()
or whatever you like to useThat is presuming that in the "middle" conversion you used ISO 8859-1. If you used Windows-1252, then convert into Windows-1252 (latin1). The original source encoding is not important; the one you used in flawed, second conversion is.
This is my guess at what happened; there's very little else you could have done to get four bytes in place of one extended ASCII byte.
The German language also uses ISO 8859-2 and Windows-1250 (Latin-2).