my CSV file have some email addresses. Some of them have incomplete address. How do I make them fully recognizable using python?

喜欢而已 提交于 2019-12-11 17:18:13

问题


I'm a begineer in data science with python. I'm working on a Dataset in which i've to do following tasks: Using the Python petl:

a. clean the data in the clinics.csv. This involves using python and Regex to standardise email addresses so they are usable as a html link, and

b. output the merged and cleaned data into a CSV file with the name clinic_locations.csv.

So, far i'm able to do handle a part of point (b) i.e. i've easily extracted data from the xml file and combined it with the csv file. But the problem with this is

I can't clean the data of my CSV file

This is my CSV file :

ID  Name    Suburb  State   Postcode    Email
1   Hurstville Clinic   Hurstville  NSW 1493    hurstville
2   Sydney Centre Clinic    Sydney  NSW 2000    sydney@myclinic.com.au
3   Auburn Clinic   Auburn  NSW 2144    auburn@myclinic.com.au
4   Riverwood Clinic    Riverwood   NSW 2210    riverwood@myclinic.com.au

as you can see the data in email column is incomplete and whole links are unusable. Can anyone help me from starting.

updated: the output that I'm getting is

   ID                     Name       Suburb State  Postcode  \
0   1        Hurstville Clinic   Hurstville   NSW      1493
1   2     Sydney Centre Clinic       Sydney   NSW      2000
2   3            Auburn Clinic       Auburn   NSW      2144
3   4         Riverwood Clinic    Riverwood   NSW      2210
4   5        Fingal Bay Clinic   Fingal Bay   NSW      2315
5   6        Harrington Clinic   Harrington   NSW      2427
6   7       Back Forest Clinic  Back Forest   NSW      2535
7   8         Jindabyne Clinic    Jindabyne   NSW      2627
8   9          Benolong Clinic     Benolong   NSW      2830
9  10  Melbourne Centre Clinic    Melbourne   VIC      3000

                         Email
0   hurstville@myclinic.com.au
1       sydney@myclinic.com.au
2       auburn@myclinic.com.au
3    riverwood@myclinic.com.au
4   fingal bay@myclinic.com.au
5   harrington@myclinic.com.au
6  back forest@myclinic.com.au
7   jindabyne @myclinic.com.au
8     benolong@myclinic.com.au
9    melbourne@myclinic.com.au

回答1:


I hope this will help assuming you have similar domain for all email ids:

import pandas as pd

df=pd.read_csv("clinic_locations.csv")  #Provide complete path to your file

df['Email']=df['Email'].apply(lambda x: x if '@' in str(x) else str(x)+'@myclinic.com.au')

#To see data frame
print(df.head(10))


来源:https://stackoverflow.com/questions/48700691/my-csv-file-have-some-email-addresses-some-of-them-have-incomplete-address-how

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!