What is nokogiri % encoding $ character

二次信任 提交于 2019-12-13 03:38:00

问题


Why do I get:

Nokogiri::HTML('<a href="/test_$4b.html">test</a>').to_html

=> "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><a href=\"/test_%244b.html\">test</a></body></html>\n"

I thought $ symbol was valid in the url?

Followup:

Why do browsers handle this differently. E.g. In the page: http://www.pmlive.com/pharma_news/its_on_shire_and_abbvie_agree_32bn_takeover_586969

The link: http://www.pmlive.com/pharma_news/mylan_buys_abbotts_non-us_generics_in_$5.3bn_deal_585883 works.

But nokogiri would parse this link as: http://www.pmlive.com/pharma_news/mylan_buys_abbotts_non-us_generics_in_%245.3bn_deal_585883 which does not work (returns 404).

Are they making the decision that $ is actually safe and a better choice?


回答1:


There's this RFC3986 here which lists the dollar sign as a reserved sub-delimiter (page 12).

reserved = gen-delims / sub-delims

gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"

sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

It also recommends how reserved characters should be handle:

2.2. Reserved Characters

URIs include components and subcomponents that are delimited by characters in the "reserved" set. These characters are called "reserved" because they may (or may not) be defined as delimiters by the generic syntax, by each scheme-specific syntax, or by the implementation-specific syntax of a URI's dereferencing algorithm. If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed.

The authors of Nokogiri liked decided that since their library may be used by anyone for any purpose, there is no way to automatically determine whether a reserved character would conflict or not, and therefore the "safest" way to handle it (short of testing a URI directly) would be to escape it as per the recommendation.



来源:https://stackoverflow.com/questions/24878109/what-is-nokogiri-encoding-character

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!