Download a csv file from gmail using python

我与影子孤独终老i 提交于 2020-01-01 07:14:44

问题


I tried different python scripts for download a CSV attachment from Gmail. But I could not able to get it.Is this possible. If it is possible which python script should I use? Thank you.


回答1:


TL;DR

  • If you want to skip all the details in this answer I've put together a Github repo that makes getting csv data from gmail as simple as:

    from gmail import *
    service = get_gmail_service()
    
    # get all attachments from e-mails containing 'test'
    search_query = "test"
    service = get_gmail_service()
    csv_dfs = query_for_csv_attachments(service, search_query)
    print(csv_dfs)
    
  • here is the repo: https://github.com/robertdavidwest/google_api

  • Just follow the instructions in the README and have fun and please feel free to contribute!

THE LONG ANSWER - directly using google-api-python-client and oauth2client

  • Follow this link and click on the button: "ENABLE THE GMAIL API"

    https://developers.google.com/gmail/api/quickstart/python

    After the set up you will download a file called credentials.json

  • install the needed python packages

    pip install --upgrade google-api-python-client oauth2client
    
  • The following code snippet will allow you to connect to your gmail account via python

    from googleapiclient.discovery import build
    from httplib2 import Http
    from oauth2client import file, client, tools
    
    GMAIL_CREDENTIALS_PATH = 'credentials.json' # downloaded
    GMAIL_TOKEN_PATH = 'token.json' # this will be created
    
    store = file.Storage(GMAIL_TOKEN_PATH)
    creds = store.get()
    if not creds or creds.invalid:
        flow = client.flow_from_clientsecrets(GMAIL_CREDENTIALS_PATH, SCOPES)
        creds = tools.run_flow(flow, store)
    service = build('gmail', 'v1', http=creds.authorize(Http()))
    
  • Now with this service you can read your emails and read any attachments you may have in your e-mails

  • First you can query your e-mails with a search string to find the e-mail ids you need that have the attachments:

    search_query = "ABCD"
    result = service.users().messages().list(userId='me', q=search_query).execute()
    msgs = results['messages')
    msg_ids = [msg['id'] for msg in msgs]
    
  • now for each messageId you can find the associated attachments in the email.

  • This part is a little messy so bear with me. First we obtain a list of "attachment parts" (and attachment filenames) from the e-mail. These are components of the email that contain attachments:

    messageId = 'XYZ'
    msg = service.messages().get(userId='me', id=messageId).execute()
    parts = msg.get('payload').get('parts')
    all_parts = []
    for p in parts:
        if p.get('parts'):
            all_parts.extend(p.get('parts'))
        else:
            all_parts.append(p)
    
    att_parts = [p for p in all_parts if p['mimeType']=='text/csv']
    filenames = [p['filename'] for p in att_parts]
    
  • Now we can obtain the attached csv from each part:

    messageId = 'XYZ'
    data = part['body'].get('data')
    attachmentId = part['body'].get('attachmentId')
    if not data:
        att = service.users().messages().attachments().get(
                userId='me', id=attachmentId, messageId=messageId).execute()
        data = att['data']
    
  • Now you have the csv data but its in an encoded format, so finally we change the encoding and convert the result into a pandas dataframe

    import base64
    import pandas as pd
    from StringIO import StringIO
    str_csv  = base64.urlsafe_b64decode(data.encode('UTF-8'))
    df = pd.read_csv(StringIO(str_csv))
    
  • and thats it! you have a pandas dataframe with the contents of the csv attachment. You can work with this dataframe. Or you could write it to disk with pd.DataFrame.to_csv if you simply want to download the csv. You can use the list of filenames we obtained earlier if you want to preserve the filename




回答2:


I got it. This is not my own work. I got some codes, combined them and modified to this code. However, finally, it worked.

print 'Proceeding'

import email
import getpass
import imaplib
import os
import sys

userName = 'yourgmail@gmail.com'
passwd = 'yourpassword'
directory = '/full/path/to/the/directory'


detach_dir = '.'
if 'DataFiles' not in os.listdir(detach_dir):
    os.mkdir('DataFiles')



try:
    imapSession = imaplib.IMAP4_SSL('imap.gmail.com')
    typ, accountDetails = imapSession.login(userName, passwd)
    if typ != 'OK':
        print 'Not able to sign in!'
        raise

    imapSession.select('[Gmail]/All Mail')
    typ, data = imapSession.search(None, 'ALL')
    if typ != 'OK':
        print 'Error searching Inbox.'
        raise


    for msgId in data[0].split():
        typ, messageParts = imapSession.fetch(msgId, '(RFC822)')
        if typ != 'OK':
            print 'Error fetching mail.'
            raise

        emailBody = messageParts[0][1]
        mail = email.message_from_string(emailBody)
        for part in mail.walk():
            if part.get_content_maintype() == 'multipart':
                continue
            if part.get('Content-Disposition') is None:
                continue
            fileName = part.get_filename()

            if bool(fileName):
                filePath = os.path.join(detach_dir, 'DataFiles', fileName)
                if not os.path.isfile(filePath) :
                    print fileName
                    fp = open(filePath, 'wb')
                    fp.write(part.get_payload(decode=True))
                    fp.close()
    imapSession.close()
    imapSession.logout()

    print 'Done'


except :
    print 'Not able to download all attachments.'


来源:https://stackoverflow.com/questions/41749236/download-a-csv-file-from-gmail-using-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!