“missing word in phrase: charset not supported”, when using the mail package

可紊 提交于 2019-11-30 19:01:56

问题


I'm trying to parse emails and I get this kind of errors using the mail package. Is it a bug on the mail package or something I should handle myself ?

missing word in phrase: charset not supported: "gb18030"

charset not supported: "koi8-r" missing word in phrase: charset not supported: "ks_c_5601-1987"

How can I fix them ? I think I should use charset but I'm not sure how . Here's how an email header looks like

Received: from smtpbg303.qq.com ([184.105.206.26]) by mx-ha.gmx.net
 (mxgmxus001) with ESMTPS (Nemesis) id 0MAOx2-1X2yNC2ZFC-00BaVU for
 <sormester@lobbyist.com>; Sat, 14 Jun 2014 18:11:48 +0200
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201307;
    t=1402762305; bh=imEvSr8IPsqWTXU63xUHRv+wuQG+Tcz2mPP9ai4rrE4=;
    h=X-QQ-FEAT:X-QQ-SSF:X-HAS-ATTACH:X-QQ-BUSINESS-ORIGIN:
     X-Originating-IP:In-Reply-To:References:X-QQ-STYLE:X-QQ-mid:From:To:Subject:Mime-Version:Content-Type:Content-Transfer-Encoding:Date:
     X-Priority:Message-ID:X-QQ-MIME:X-Mailer:X-QQ-Mailer:
     X-QQ-ReplyHash:X-QQ-SENDSIZE:X-QQ-FName:X-QQ-LocalIP;
    b=QXs4CveboS8nG6htN9W6amC3X+F7X3ZtFrt6jrjWI+RmbvqBuTCVmX9IlaqCX84H8
     n14x2Wp7x4kDYcNRqhe+HjTpf715TTQXc4d40b9e38frC/5qIhpMtYNsD8iEJwRzHW
     U3xi8Yq7OCIB303fIpytx8tOjexQpZKSHbJ7ecX0=
X-QQ-FEAT: zaIfg0hwV2pIDflZYPQUsuPPXG5wtRVHJU6PiOYLBBA=
X-QQ-SSF: 00010000000000F000000000000000L
X-HAS-ATTACH: no
X-QQ-BUSINESS-ORIGIN: 2
X-Originating-IP: 180.155.99.102
In-Reply-To: <trinity-b7c6d611-52fd-4afa-b739-2deb243532a6-1402761364579@3capp-mailcom-lxa05>
References: <97e07dab7c2d1a005ed928c4350690e0@hotels-desk.co.uk>,
 <tencent_105D3DC11702F53465C0025D@qq.com>
    <trinity-b7c6d611-52fd-4afa-b739-2deb243532a6-1402761364579@3capp-mailcom-lxa05>
X-QQ-STYLE: 
X-QQ-mid: webmail474t1402762303t356131
From: "=?gb18030?B?08bTzg==?=" <38438nx@qq.com>
To: "=?gb18030?B?V2lsaGVsbSBLdW1tZXI=?=" <sormester@lobbyist.com>
Subject: =?gb18030?B?u9i4tKO6ILvYuLSjulBhbGFjZSBXZXN0bWluc3Rl?=
 =?gb18030?B?cjogMDEtMDctMjAxNCAtIDA0LTA3LTIwMTQ=?=
Mime-Version: 1.0
Content-Type: multipart/alternative;
    boundary="----=_NextPart_539C743F_08A07490_0157E268"
Content-Transfer-Encoding: 8Bit
Date: Sun, 15 Jun 2014 00:11:43 +0800
X-Priority: 3
Message-ID: <tencent_573A737E73016B9F5A3D10C1@qq.com>
X-QQ-MIME: TCMime 1.0 by Tencent
X-Mailer: QQMail 2.x
X-QQ-Mailer: QQMail 2.x
X-QQ-ReplyHash: 170675637
X-QQ-SENDSIZE: 520
X-QQ-FName: 7B2EFFAD16B8462B84D3499A4CC7DDEF
X-QQ-LocalIP: 163.177.66.155
Envelope-To: <sormester@lobbyist.com>
X-GMX-Antispam: 0 (Mail was not recognized as spam); Detail=V3;
X-GMX-Antivirus: 0 (no virus found)

Edit:

I've tried to use the charset package it but it has no effect. I still get the same error on the same messages.

import "code.google.com/p/go-imap/go1/imap"
header := imap.AsBytes(rsp.MessageInfo().Attrs["RFC822.HEADER"])

            r, err := charset.NewReader("UTF-8", bytes.NewReader(header))
            if err != nil {
                log.Fatal(err)
            }
            fmt.Printf("new char is %v", r)

            msg, err := mail.ReadMessage(r)
            if err != nil {
                log.Fatal(err)
                return mgs, err
            }

            mg.From, err = msg.Header.AddressList("From")
            if err != nil {
                log.Errorf("NO FROM msg %s, err %v", header, err)
             return
              }

The mail package seems to be able to decode only rfc2047 but the charset package doesn't support this

character set "rfc2047" not found

It seems mahonia which could fix the issue?


回答1:


I hope this helps someone who may consider Go to process emails(i.e develop client apps). It seems the standard Go standard library is not mature enough for email processing. It doesn't handle multi-part, different char sets etc. After almost a day trying different hacks and packages I've decided to just throw the go code away and use an old good JavaMail solution.




回答2:


Alexey Vasiliev's MIT-licensed http://github.com/le0pard/go-falcon/ includes a parser package that applies whichever encoding package is needed to decode the headers (the meat is in utils.go).

package main

import (
        "bufio"
        "bytes"
        "fmt"
        "net/textproto"
        "github.com/le0pard/go-falcon/parser"
)

var msg = []byte(`Subject: =?gb18030?B?u9i4tKO6ILvYuLSjulBhbGFjZSBXZXN0bWluc3Rl?=
 =?gb18030?B?cjogMDEtMDctMjAxNCAtIDA0LTA3LTIwMTQ=?=

`)


func main() {
        tpr := textproto.NewReader(bufio.NewReader(bytes.NewBuffer(msg)))
        mh, err := tpr.ReadMIMEHeader()
        if err != nil {
                panic(err)
        }
        for name, vals := range mh {
                for _, val := range vals {
                        val = parser.MimeHeaderDecode(val)
                        fmt.Print(name, ": ", val, "\n")
                }
        }
}

It looks like its parser.FixEncodingAndCharsetOfPart is used by the package to decode/convert content as well, though with a couple of extra allocations caused by converting the []byte body to/from a string. If you don't find the API works for you, you might at least be able to use the code to see how it can be done.

Found via godoc.org's "...and is imported by 3 packages" link from encoding/simplifiedchinese -- hooray godoc.org!




回答3:


I've been using github.com/jhillyerd/enmime which seems to have no trouble with this. It'll parse out both headers and body content. Given an io.Reader r:

// Parse message body
env, _ := enmime.ReadEnvelope(r)
// Headers can be retrieved via Envelope.GetHeader(name).
fmt.Printf("From: %v\n", env.GetHeader("From"))
// Address-type headers can be parsed into a list of decoded mail.Address structs.
alist, _ := env.AddressList("To")
for _, addr := range alist {
  fmt.Printf("To: %s <%s>\n", addr.Name, addr.Address)
}
fmt.Printf("Subject: %v\n", env.GetHeader("Subject"))

// The plain text body is available as mime.Text.
fmt.Printf("Text Body: %v chars\n", len(env.Text))

// The HTML body is stored in mime.HTML.
fmt.Printf("HTML Body: %v chars\n", len(env.HTML))

// mime.Inlines is a slice of inlined attacments.
fmt.Printf("Inlines: %v\n", len(env.Inlines))

// mime.Attachments contains the non-inline attachments.
fmt.Printf("Attachments: %v\n", len(env.Attachments))


来源:https://stackoverflow.com/questions/24902453/missing-word-in-phrase-charset-not-supported-when-using-the-mail-package

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!