Best practice for allowing Markdown in Python, while preventing XSS attacks?

前端 未结 2 773
孤街浪徒
孤街浪徒 2021-01-30 22:25

I need to let users enter Markdown content to my web app, which has a Python back end. I don’t want to needlessly restrict their entries (e.g. by not allowing any HTML,

2条回答
  •  忘掉有多难
    2021-01-30 23:00

    I was unable to determine “best practice,” but generally you have three choices when accepting Markdown input:

    1. Allow HTML within Markdown content (this is how Markdown originally/officially works, but if treated naïvely, this can invite XSS attacks).

    2. Just treat any HTML as plain text, essentially letting your Markdown processor escape the user’s input. Thus in input will not create small text but rather the literal text “”.

    3. Throw out all HTML tags within Markdown. This is pretty user-hostile and may choke on text like <3 depending on implementation. This is the approach taken here on Stack Overflow.

    My question regards case #1, specifically.

    Given that, what worked well for me is sending user input through

    1. Markdown for Python, which optionally supports Extra syntax and then through
    2. html5lib’s sanitizer.

    I threw a bunch of XSS attack attempts at this combination, and all failed (hurray!); but using benign tags like worked flawlessly.

    This way, you are in effect going with option #1 (as desired) except for potentially dangerous or malformed HTML snippets, which are treated as in option #2.

    (Thanks to Y.H Wong for pointing me in the direction of that Markdown library!)

提交回复
热议问题