How to decode HTML Entities in C?

后端未结

关注

 5  1442

广开言路

I\'m interested in unescaping text for example: \ maps to \\ in C. Does anyone know of a good library?

As reference the Wikipedia

相关标签:

5条回答

傲寒

2020-11-30 11:42

QString UNESC(const QString &txt) {
    QStringList bld;
    static QChar AMP = '&', SCL = ';';
    static QMap<QString, QString> dec = {
        {"&lt;", "<"}, {"&gt;", ">"}
      , {"&amp;", "&"}, {"&quot;", R"(")"}, {"&#039;", "'"} };

    if(!txt.contains(AMP)) { return txt; }

    int bgn = 0, pos = 0;
    while((pos = txt.indexOf(AMP, pos)) != -1) {
        int end = txt.indexOf(SCL, pos)+1;
        QString val = dec[txt.mid(pos, end - pos)];

        bld << txt.mid(bgn, pos - bgn);

        if(val.isEmpty()) {
            end = txt.indexOf(AMP, pos+1);
            bld << txt.mid(pos, end - pos);
        } else {
            bld << val;
        }// else // if(val.isEmpty())

        bgn = end; pos = end;
    }// while((pos = txt.indexOf(AMP, pos)) != -1)

    return bld.join(QString());
}// UNESC

0 讨论(0)

自闭症患者

2020-11-30 11:47

I wrote my own unescape code; very simplified, but does the job: pn_util.c

0 讨论(0)
发布评论:

提交评论
- 加载中...
难免孤独

2020-11-30 11:51
I had some free time today and wrote a decoder from scratch: entities.c, entities.h.

The only function with external linkage is
```
size_t decode_html_entities_utf8(char *dest, const char *src);
```
If src is a null pointer, the string will be taken from dest, ie the entities will be decoded in-place. Otherwise, the decoded string will be put in dest - which should point to a buffer big enough to hold strlen(src) + 1 characters - and src will be unchanged.

The function will return the length of the decoded string.

Please note that I haven't done any extensive testing, so there's a high probability of bugs...
0 讨论(0)
发布评论:

提交评论
- 加载中...
情歌与酒

2020-11-30 11:51

For another open source reference in C to decoding these HTML entities you can check out the command line utility uni2ascii/ascii2uni. The relevant files are enttbl.{c,h} for entity lookup and putu8.c which down converts from UTF32 to UTF8.

uni2ascii

0 讨论(0)
发布评论:

提交评论
- 加载中...

忘了有多久

2020-11-30 11:55

Function Description: Convert special HTML entities back to characters. Need to do some modifications to fit your requirement.

char* HtmlSpecialChars_Decode(char* encodedHtmlSpecialEntities)
{
int encodedLen = 0;
int escapeArrayLen = 0;
static char decodedHtmlSpecialChars[TITLE_SIZE];
char innerHtmlSpecialEntities[MAX_CONFIG_ITEM_SIZE];

/* This mapping table can be extended if necessary. */
static const struct {
    const char* encodedEntity;
    const char decodedChar;
} entityToChars[] = {
        {"&lt;", '<'},
        {"&gt;", '>'},
        {"&amp;", '&'},
        {"&quot;", '"'},
        {"&#039;", '\''},
    };

if(strchr(encodedHtmlSpecialEntities, '&') == NULL)
    return encodedHtmlSpecialEntities;

memset(decodedHtmlSpecialChars, '\0', TITLE_SIZE);
memset(innerHtmlSpecialEntities, '\0', MAX_CONFIG_ITEM_SIZE);
escapeArrayLen = sizeof(entityToChars) / sizeof(entityToChars[0]);


strcpy(innerHtmlSpecialEntities, encodedHtmlSpecialEntities);
encodedLen = strlen(innerHtmlSpecialEntities);

for(int i = 0; i < encodedLen; i++)
{
    if(innerHtmlSpecialEntities[i] == '&')
    {
        /* Potential encode char. */
        char * tempEntities = innerHtmlSpecialEntities + i;

        for(int j = 0; j < escapeArrayLen; j++)
        {
            if(strncmp(tempEntities, entityToChars[j].encodedEntity, strlen(entityToChars[j].encodedEntity)) == 0)
            {
                int index = 0;
                strncat(decodedHtmlSpecialChars, innerHtmlSpecialEntities, i);

                index = strlen(decodedHtmlSpecialChars);
                decodedHtmlSpecialChars[index] = entityToChars[j].decodedChar;
                if(strlen(tempEntities) > strlen(entityToChars[j].encodedEntity))
                {
                    /* Not to the end, continue */
                    char temp[MAX_CONFIG_ITEM_SIZE] = {'\0'};
                    strcpy(temp, tempEntities + strlen(entityToChars[j].encodedEntity));
                    memset(innerHtmlSpecialEntities, '\0', MAX_CONFIG_ITEM_SIZE);
                    strcpy(innerHtmlSpecialEntities, temp);

                    encodedLen = strlen(innerHtmlSpecialEntities);
                    i = -1;
                }
                else
                    encodedLen = 0;

                break;
            }
        }
    }
}

if(encodedLen != 0)
    strcat(decodedHtmlSpecialChars, innerHtmlSpecialEntities);

return decodedHtmlSpecialChars;

}

0 讨论(0)