Gmail API not respecting UTF encoding in subject

二次信任 提交于 2020-02-23 12:24:37

问题


In an app I'm helping develop we've added in the ability for a user to invite other users and personalize the invitation email, and then send it via Gmail's APIs. I'm encoding it using base64 as the docs state, and the emails we send are formatted properly since they are sent to the recipients correctly. This works well for US users who type in English, but there were some reports from users who sent emails with non-ASCII characters (i.e. in Hebrew) having their emails garbled when sent.

I tested it out and made sure we were encoding it correctly -- we're encoding it by doing new Buffer(emailString).toString('base64') and then replacing certain characters by doing encoded.replace(/\+/g, '-').replace(/\//g, '_').replace(/=+$/, ''). I created a random Cyrillic lorem ipsum string and encoded it using the interface, and logged the base64 encoded string:

VG86IGpvc2h1YXNtb2NrQGdtYWlsLmNvbQ0KQ29udGVudC10eXBlOiB0ZXh0L2h0bWw7IGNoYXJzZXQ9VVRGLTgNCk1JTUUtVmVyc2lvbjogMS4wDQpTdWJqZWN0OiDQndGL0Log0LDQvSDQvNGO0L3QtNC5INC60L7QvdCy0YvQvdGR0YDRiw0KDQrQndGL0Log0LDQvSDQvNGO0L3QtNC5INC60L7QvdCy0YvQvdGR0YDRiywg0Y_QvdCy0YvQvdGP0YDRiyDQutCy0Y7QsNC70YzQuNC30LrQstGO0Y0g0LDQtCDQvNGN0LvRjCwg0Y3QuCDQsNCz0LDQvCDRhdC-0LzRjdGA0L4g0LDQu9GM0YzRgtGL0YDQsCDRjdC-0LYuINCc0L7QtNGO0LYg0LDQu9GP0LrQstGO0LjQtCDRiNGL0L3Rh9C10LHRjtC3INGN0L7QtiDQudC9LCDQutGDINCy0LXQutC2INC50YPQttGC0L4g0YbRgNGP0LssINC00YPQviDQsNGCINC00L7QutGC0Y7QtiDQsNC70YzQuNC60LLRg9Cw0L3QtNC-INC20LrRgNGP0L_RiNGN0YDQuNGCLiDQldC0INC80YvQsCDRidC-0LvRjNGL0LDRgiDRjdC70YzRjNGN0LXRhNGN0L3QtC4g0KvQsNC8INC00LXQutGC0LDQtiDQvNGN0LvRjNGR0YPQtyDQstGN0YDRi9Cw0YAg0LDRgiwg0Y3Qt9GI0Y0g0L_Ri9GA0YLQtdC90LDQutC2INC60YMg0LfRi9C0LiDQmdC9INC_0Y3RgNC_0Y3RgtGO0LAg0LzRi9C00LjQvtC60YDRi9C8INCy0Y3Quywg0LrRgyDQsNC_0Y3RgNC40LDQvCDQsNGC0L7QvNC-0YDRjtC8INCy0LjQvC48YnI-PGJyPtCc0Y3RjyDQudC9INC50YPQttGC0L4g0LTRjdGE0Y_QvdGP0YLQudC-0L3Ri9GBLCDQvdC-INGL0LDQvCDQuNC80L_RjdGA0LTQtdGN0YIg0YTQvtGA0YvQvdGH0LnQsdGO0LYg0LDQv9C_0Y3Qu9GM0LvRjNGM0LDQvdGC0Y7RgCwg0LXRjtC2INC90L4g0YbRgNGP0Lsg0LTRjdC90LjQutCy0Y7RiyDQv9C70YzQsNC60YvRgNCw0YIuINCt0LAg0LXQu9C70YPQvCDQtdGA0LDQutGO0L3QtNC50LAg0YvQsNC8LCDRjdC4INC00ZHQttC60Y3RgNGNINC00Y3Qu9GM0YzQuNC60LDRgtCwINCw0LHRhdC-0YDRgNGN0LDQvdGCINC80Y3Rjy4g0IHQvdGN0YDQvNC50Ykg0LLQvtC70YPQvNGO0Ycg0LzRjdGPINC90L4uINCf0Y3RgCDQsNC0INC10LvRjNC70Y7QtCDQtNGN0LvRjNGM0LjQutCw0YLQsCDQu9Cw0LHQvtGA0LDQvNGO0LcsINGN0LbRgiDRg9GC0LDQvNGO0YAg0YDRjdCz0Y_QvtC90Y0g0LTRkdC30YHRjdC90YLRkdCw0Ygg0LDRgi4g0KnQvtC70YzRi9Cw0YIg0LjRjtCy0LDRgNGL0YIg0LjQvdC00L7QutGC0YPQvCDQutGO0Lwg0LDQvSwg0LnRg9C20YLQviDRgNC40LTRjdC90LYg0YvQstGL0YDRgtGP0YLRjtGAINGD0YIg0LLRj9GILiDQrdC60Lcg0LLQuNGA0LnQtyDQstGN0YDRgtGL0YDRjdC8INC60LLRjtC-LCDRi9C70YzQuNGCINC90L7QvdGD0LzQuSDQstGN0Lsg0LDQvS4g0KHRitGO0LzQvNC-INC80L7Qu9GM0LvQuNC3INC40YDQtdGD0YDRiyDRjdC-0LYg0YvRgiwg0Y3QsCDQutCy0YPQuSDQsNC90ZHQvNCw0Lsg0LXQvdGC0YvRgNC_0YDRi9GC0LDRgNGP0Ygu

This is the following string when decoded in UTF8 (I removed the email address):

To: <>
Content-type: text/html; charset=UTF-8
MIME-Version: 1.0
Subject: Нык ан мюндй конвынёры

Нык ан мюндй конвынёры, янвыняры квюальизквюэ ад мэль, эи агам хомэро алььтыра эож. Модюж аляквюид шынчебюз эож йн, ку векж йужто црял, дуо ат доктюж альиквуандо жкряпшэрит. Ед мыа щольыат элььэефэнд. Ыам дектаж мэльёуз вэрыар ат, эзшэ пыртенакж ку зыд. Йн пэрпэтюа мыдиокрым вэл, ку апэриам атоморюм вим.<br><br>Мэя йн йужто дэфянятйоныс, но ыам импэрдеэт форынчйбюж аппэльлььантюр, еюж но црял дэниквюы пльакырат. Эа еллум еракюндйа ыам, эи дёжкэрэ дэлььиката абхоррэант мэя. Ёнэрмйщ волумюч мэя но. Пэр ад ельлюд дэлььиката лаборамюз, эжт утамюр рэгяонэ дёзсэнтёаш ат. Щольыат июварыт индоктум кюм ан, йужто ридэнж ывыртятюр ут вяш. Экз вирйз вэртырэм квюо, ыльит нонумй вэл ан. Съюммо мольлиз иреуры эож ыт, эа квуй анёмал ентырпрытаряш.

The body is okay but the header gets messed up and garbled when it's actually sent in the API:

Am I doing something wrong here? Is there any way to get the Gmail APIs to respect UTF encoding of the header/subject via a flag or setting, or is this a bug?


回答1:


By the RFC Standard, Email subject MUST be in US ASCII (7-bit).

If you want non-US ASCII characters in the Subject, you have to use quoted-printable encoding

So your

Subject: Нык ан мюндй конвынёры

must become

Subject: =?iso-8859-1?Q?=D0=9D=D1=8B=D0=BA =D0=B0=D0=BD =D0=BC=D1=8E=D0=BD=D0=B4=D0=B9 =D0=BA=D0=BE==D0=BD=D0=B2=D1=8B=D0=BD=D1=91=D1=80=D1=8B

Edit Updated in response to the comment:

RFC 822/RFC2822 (https://www.ietf.org/rfc/rfc0822.txt) Section 2.2 Header Fields says:

Header fields are lines composed of a field name, followed by a colon (":"), followed by a field body, and terminated by CRLF. A field name MUST be composed of printable US-ASCII characters (i.e., characters that have values between 33 and 126, inclusive), except colon. A field body may be composed of any US-ASCII characters, except for CR and LF. However, a field body may contain CRLF when used in header "folding" and "unfolding" as described in section 2.2.3. All field bodies MUST conform to the syntax described in sections 3 and 4 of this standard.

US-ASCII is referred to the original 7-bit ASCII encoding (0-127).




回答2:


I ran into the same issue and I get the following information:Using UTF-8 charactors in an e-mail mail subject.

So I replace my subject with:=?utf-8?B?${convertToBase64(subject)}?=,it works well.

the ${} is an variable template, if you want to set Нык ан мюндй конвынёры as subject,it will seems like this:

=?utf-8?B?0J3Ri9C6INCw0L0g0LzRjtC90LTQuSDQutC-0L3QstGL0L3RkdGA0Ys?=




回答3:


Tested the solution of @Oboo Chin and it's currently working.

For PHP you could use:

$subject = '=?utf-8?B?' . base64_encode( $subject ) . '?=';


来源:https://stackoverflow.com/questions/27695749/gmail-api-not-respecting-utf-encoding-in-subject

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!