PHP Security: how can encoding be misused?

◇◆丶佛笑我妖孽 提交于 2019-11-28 21:21:09

how can a user create harm if I do not use the mb_check_encoding functionality?

This is about overlong encodings.

Due to an unfortunate quirk of UTF-8 design, it is possible to make byte sequences that, if parsed with a naïve bit-packing decoder, would result in the same character as a shorter sequence of bytes - including a single ASCII character.

For example the character < is usually represented as byte 0x3C, but could also be represented using the overlong UTF-8 sequence 0xC0 0xBC (or even more redundant 3- or 4-byte sequences).

If you take this input and handle it in a Unicode-oblivious byte-based tool, then any character processing step being used in that tool may be evaded. The canonical example would be submitting 0x80 0xBC to PHP, which has native byte strings. The typical use of htmlspecialchars to HTML-encode the character < would fail here because the expected byte sequence 0x3C is not present. So the output of the script would still include the overlong-encoded <, and any browser reading that output could potentially read the sequence 0x80 0xBC 0x73 0x63 0x72 0x69 0x70 0x74 as <script and hey presto! XSS.

Overlongs have been banned since way back and modern browsers no longer permit them. But this was a genuine problem for IE and Opera for a long time, and there's no guarantee every browser is going to get it right in future. And of course this is only one example - any place where a byte-oriented tool processes Unicode strings you've potentially got similar problems. The best approach, therefore, is to remove all overlongs at the earliest input phase.

Seems like this is a complicated attack. Checking the docs for mb_check_encoding gives note to a "Invalid Encoding Attack". Googling "Invalid Encoding Attack" brings up some interesting results that I will attempt to explain.

When this kind of data is sent to the server it will perform some decoding to interpret the characters being sent over. Now the server will do some security checks to look for the encoded version of some special characters that could be potentially harmful.

When invalid encoding is sent to the server, the server still runs its decoding algorithm and it will evaluate the invalid encoding. This is where the trouble happens because the security checks may not be looking for invalid variants that would still produce harmful characters when run through the decoding algorithm.

Example of an attack requesting a full directory listing on a unix system :

http://host/cgi-bin/bad.cgi?foo=..%c0%9v../bin/ls%20-al|

Here are some links if you would like a more detailed technical explanation of what is going on in the algorithms:

http://www.cgisecurity.com/owasp/html/ch11s03.html#id2862815

http://www.cgisecurity.com/fingerprinting-port-80-attacks-a-look-into-web-server-and-web-application-attack-signatures.html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!