Resolving html entities with NSXMLParser on iPhone

前端 未结 6 1490
佛祖请我去吃肉
佛祖请我去吃肉 2020-12-08 22:05

I think I read every single web page relating to this problem but I still cannot find a solution to it, so here I am.

I have an HTML web page which is not under my co

6条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-12-08 22:24

    A possibly less hacky solution is replace the DTD with a local modified one with all external entity declaration replaced with local one.

    This is how I do it:

    First, find and replace the document DTD declaration with a local file. For example, replace this:

    
    hi!

    Hello

    with this:

    
    hi!

    Hello

    ```

    Download the DTD from the W3C URL and add it to your app bundle. You can find the path of the file with following code:

    NSBundle* bundle = [NSBundle bundleForClass:[self class]];
    NSString* path = [[bundle URLForResource:@"xhtml1-transitional" withExtension:@"dtd"] absoluteString];
    

    Open the DTD file, find any external entity reference:

    
    %HTMLlat1;      
    

    replace it with the content of the entity file ( http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent in the above case)

    After replacing all external reference, NSXMLParser should properly handle the entities without the need to download every remote DTD/external entities each time it parse a XML file.

提交回复
热议问题