Why does this Python program send empty emails when I encode it with utf-8? [duplicate]

↘锁芯ラ 提交于 2020-08-06 07:21:11

问题


Before encoding the msg variable, I was getting this error:

UnicodeEncodeError: 'ascii' codec can't encode character '\xfc' in position 4: ordinal not in range(128)

So I did some research, and finally encoded the variable:

msg = (os.path.splitext(base)[0] + ': ' + text).encode('utf-8')
server.sendmail('...@gmail.com', '...@gmail.com', msg)

Here's the rest of the code on request:

def remind_me(path, time, day_freq):

for filename in glob.glob(os.path.join(path, '*.docx')):
    # file_count = sum(len(files))
    # random_file = random.randint(0, file_number-1)
    doc = docx.Document(filename)
    p_number = len(doc.paragraphs)

    text = ''
    while text == '':
        rp = random.randint(0, p_number-1) # random paragraph number
        text = doc.paragraphs[rp].text # gives the entire text in the paragraph

    base = os.path.basename(filename)
    print(os.path.splitext(base)[0] + ': ' + text)
    server = smtplib.SMTP('smtp.gmail.com', 587)
    server.starttls()
    server.login('...@gmail.com', 'password')
    msg = (os.path.splitext(base)[0] + ': ' + text).encode('utf-8')
    server.sendmail('...@gmail.com', '...@gmail.com', msg)
    server.quit()

Now, it sends empty emails instead of delivering the message. Does it return None? If so, why?

Note: Word documents contain some characters like ş, ö, ğ, ç.


回答1:


The msg argument to smtplib.sendmail should be a bytes sequence containing a valid RFC5322 message. Taking a string and encoding it as UTF-8 is very unlikely to produce one (if it's already ASCII, encoding it does nothing useful; and if it isn't, you are most probably Doing It Wrong).

To explain why that is unlikely to work, let me provide a bit of background. The way to transport non-ASCII strings in MIME messages depends on the context of the string in the message structure. Here is a simple message with the word "Hëlló" embedded in three different contexts which require different encodings, none of which accept raw UTF-8 easily.

From: me <sender@example.org>
To: you <recipient@example.net>
Subject: =?utf-8?Q?H=C3=ABll=C3=B3?= (RFC2047 encoding)
MIME-Version: 1.0
Content-type: multipart/mixed; boundary="fooo"

--fooo
Content-type: text/plain; charset="utf-8"
Content-transfer-encoding: quoted-printable

H=C3=ABll=C3=B3 is bare quoted-printable (RFC2045),
like what you see in the Subject header but without
the RFC2047 wrapping.

--fooo
Content-type: application/octet-stream; filename*=UTF-8''H%C3%ABll%C3%B3

This is a file whose name has been RFC2231-encoded.

--fooo--

There are recent extensions which allow for parts of messages between conforming systems to contain bare UTF-8 (even in the headers!) but I have a strong suspicion that this is not the scenario you are in. Maybe tangentially see also https://en.wikipedia.org/wiki/Unicode_and_email

Returning to your code, I suppose it could work if base is coincidentally also the name of a header you want to add to the start of the message, and text contains a string with the rest of the message. You are not showing enough of your code to reason intelligently about this, but it seems highly unlikely. And if text already contains a valid MIME message, encoding it as UTF-8 should not be necessary or useful (but it clearly doesn't, as you get the encoding error).

Let's suppose base contains Subject and text is defined thusly:

text='''=?utf-8?B?H=C3=ABll=C3=B3?= (RFC2047 encoding)
MIME-Version: 1.0
Content-type: multipart/mixed; boundary="fooo"
....'''

Now, the concatenation base + ': ' + text actually produces a message similar to the one above (though I reordered some headers to put Subject: first for this scenario) but again, I imagine this is not how things actually are in your code.

If your goal is to send an extracted piece of text as the body of an email message, the way to do that is roughly

from email.mime.text import MIMEText

body_text = os.path.splitext(base)[0] + ': ' + text
sender = 'you@example.net'
recipient = 'me@example.org'

message = MIMEText(body_text)
message[subject] = 'Extracted text'
message[from] = sender
message[to] = recipient
server = smtplib.SMTP('smtp.gmail.com', 587)
# ... smtplib setup, login, authenticate?
server.send_message(message)

The MIMEText() invocation builds an email object with room for a sender, a subject, a list of recipients, and a body; its as_text() method returns a representation which looks roughly similar to the ad hoc example message above (though simpler still, with no multipart structure) which is suitable for transmitting over SMTP. It transparently takes care of putting in the correct character set and applying suitable content-transfer encodings for non-ASCII header elements and body parts (payloads).

Python's standard library contains fairly low-level functions so you have to know a fair bit in order to connect all the pieces correctly. There are third-party libraries which hide some of this nitty-gritty; but you would exepect anything with email to have at the very least both a subject and a body, as well as of course a sender and recipients.



来源:https://stackoverflow.com/questions/48555555/why-does-this-python-program-send-empty-emails-when-i-encode-it-with-utf-8

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!