zalgo

How does Zalgo text work?

独自空忆成欢 提交于 2019-12-21 02:00:53
问题 I've seen weirdly formatted text called Zalgo like below written on various forums. It's kind of annoying to look at, but it really bothers me because it undermines my notion of what a character is supposed to be. My understanding is that a character is supposed to move horizontally across a line and stay within a certain "container". Obviously the Zalgo text is moving vertically and doesn't seem to be restricted to any space. Is this a bug/flaw/exploit/hack in Unicode? Are these individual

How does Zalgo text work?

孤街醉人 提交于 2019-12-03 07:46:07
I've seen weirdly formatted text called Zalgo like below written on various forums. It's kind of annoying to look at, but it really bothers me because it undermines my notion of what a character is supposed to be. My understanding is that a character is supposed to move horizontally across a line and stay within a certain "container". Obviously the Zalgo text is moving vertically and doesn't seem to be restricted to any space. Is this a bug/flaw/exploit/hack in Unicode? Are these individual characters with weird properties? "What" is happening here? H̡̫̤̤̣͉̤ͭ̓̓̇͗̎̀ơ̯̗̱̘̮͒̄̀̈ͤ̀͡w͓̲͙͖̥͉̹͋ͬ̊ͦ̂̀̚

Why do those Thai characters display on the web page with a long tail?

╄→尐↘猪︶ㄣ 提交于 2019-12-03 05:38:15
问题 ด้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้дด็็็็็้้้้้็็็็้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้ I found some interesting characters just as I pasted above which takes only 3 spaces width. However the actual length of the string is 380. I inspected the string in python, and the string encode is as following: '\xe0\xb8\x94\xe0\xb9\x89\xe0\xb9\x89\xe0\xb9\x89\xe0\xb9\x89\xe0\xb9\x89\xe0\xb9\x87\xe0\xb9\x87\xe0\xb9\x87\xe0\xb9\x87\xe0\xb9\x87\xe0\xb9\x89\xe0\xb9

Why do those Thai characters display on the web page with a long tail?

余生长醉 提交于 2019-12-02 18:57:19
ด้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้дด็็็็็้้้้้็็็็้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้ I found some interesting characters just as I pasted above which takes only 3 spaces width. However the actual length of the string is 380. I inspected the string in python, and the string encode is as following: '\xe0\xb8\x94\xe0\xb9\x89\xe0\xb9\x89\xe0\xb9\x89\xe0\xb9\x89\xe0\xb9\x89\xe0\xb9\x87\xe0\xb9\x87\xe0\xb9\x87\xe0\xb9\x87\xe0\xb9\x87\xe0\xb9\x89\xe0\xb9\x89\xe0\xb9\x89\xe0\xb9\x89\xe0\xb9\x89\xe0\xb9\x87\xe0\xb9\x87\xe0\xb9\x87\xe0\xb9\x87\xe0\xb9\x87\xe0

How can Z͎̠͗ͣḁ̵͙̑l͖͙̫̲̉̃ͦ̾͊ͬ̀g͔̤̞͓̐̓̒̽o͓̳͇̔ͥ text be prevented?

▼魔方 西西 提交于 2019-11-29 00:57:18
问题 I've read about how Zalgo text works, and I'm looking to learn how a chat or forum software could prevent that kind of annoyance. More precisely, what is the complete set of Unicode combining characters that needs to: a) either be stripped, assuming chat participants are to use only languages that don't require combining marks (i.e. you could write "fiancé" with a combining mark, but you'd be a bit Zalgo'ed yourself if you insisted on doing so); or, b) reduced to maximum 8 consecutive

How to protect against diacritics such as Zalgo text

狂风中的少年 提交于 2019-11-28 04:20:47
The character pictured above was tweeted a few months ago by Mikko Hyppönen , a computer security expert known for his work on computer viruses and TED talks on computer security. In respect for SO, I will only post an image of it, but you get the idea. It's obviously not something you'd want spreading around your website and freaking out visitors. Upon further inspection, the character appears to be a letter of the Thai alphabet combined with over 87 diacritics (is there even a limit?!). This got me thinking about security, localization, and how one might handle this sort of input. My

What's up with these Unicode combining characters and how can we filter them?

醉酒当歌 提交于 2019-11-28 02:48:49
กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ These recently showed up in facebook comment sections. How can we sanitize this? What's up with these unicode characters? That's a character with a series of combining characters . Because the combining

How to protect against diacritics such as Zalgo text

假装没事ソ 提交于 2019-11-27 00:21:44
问题 The character pictured above was tweeted a few months ago by Mikko Hyppönen, a computer security expert known for his work on computer viruses and TED talks on computer security. In respect for SO, I will only post an image of it, but you get the idea. It's obviously not something you'd want spreading around your website and freaking out visitors. Upon further inspection, the character appears to be a letter of the Thai alphabet combined with over 87 diacritics (is there even a limit?!). This

What's up with these Unicode combining characters and how can we filter them?

▼魔方 西西 提交于 2019-11-26 18:46:45
问题 กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ These recently showed up in facebook comment sections. How can we sanitize this? 回答1: What's up with

How does Zalgo text work?

本小妞迷上赌 提交于 2019-11-25 23:23:09
问题 I\'ve seen weirdly formatted text called Zalgo like below written on various forums. It\'s kind of annoying to look at, but it really bothers me because it undermines my notion of what a character is supposed to be. My understanding is that a character is supposed to move horizontally across a line and stay within a certain \"container\". Obviously the Zalgo text is moving vertically and doesn\'t seem to be restricted to any space. Is this a bug/flaw/exploit/hack in Unicode? Are these