When encoding possibly unsafe data, is there a reason to encode >
?
The HTML4 specification in its section 5.3.2 says that
authors should use "
>
" (ASCII decimal 62) in text instead of ">"
so I believe you should encode the greater >
sign as >
(because you should obey the standards).
Yes, because if signs were not encoded, this allows xss on forms social media and many other because a attacker can use <script>
tag. If you parse the signs the browser would not execute it but instead show the sign.
Current browsers' HTML parsers have no problems with uquoted >
s
However, unfortunately, using regular expressions to "parse" HTML in JS is pretty common. (example: Ext.util.Format.stripTags). Also poorly written command line tools, IDEs, or Java classes etc. may not be sophisticated enough to determine the limiter of an opening tag.
So, you may run into problems with code like this:
<script data-usercontent=">malicious();//"></script>
(Note how the syntax highlighter treats this snippet!)
Encoding html chars is always a delicate job. You should always encode what needs to be encoded and always use standards. Using double quotes is standard, and even quotes inside double quotes should be encoded. ENCODE always. Imagine something like this
<div> this is my text an img></div>
Probably the img> will be parsed from the browser as an image tag. Browsers always try to resolve unclosed tags or quotes. As basile says use standards, otherwise you could have unexpected results without understanding the source of errors.
Strictly speaking, to prevent HTML injection, you need only encode <
as <
.
If user input is going to be put in an attribute, also encode "
as "
.
If you're doing things right and using properly quoted attributes, you don't need to worry about >
. However, if you're not certain of this you should encode it just for peace of mind - it won't do any harm.
This is to prevent XSS injections (through users using any of your forms to submit raw HTML or javascript). By escaping your output, the browser knows not to parse or execute any of it - only display it as text.
This may feel like less of an issue if you're not dealing with dynamic output based on user input, however it's important to at least understand, if not to make a good habit.