问题
I'm a begineer in data science with python. I'm working on a Dataset in which i've to do following tasks: Using the Python petl:
a. clean the data in the clinics.csv. This involves using python and Regex to
standardise email addresses so they are usable as a html link, and
b. output the merged and cleaned data into a CSV file with the name
clinic_locations.csv.
So, far i'm able to do handle a part of point (b) i.e. i've easily extracted data from the xml file and combined it with the csv file. But the problem with this is
I can't clean the data of my CSV file
This is my CSV file :
ID Name Suburb State Postcode Email
1 Hurstville Clinic Hurstville NSW 1493 hurstville
2 Sydney Centre Clinic Sydney NSW 2000 sydney@myclinic.com.au
3 Auburn Clinic Auburn NSW 2144 auburn@myclinic.com.au
4 Riverwood Clinic Riverwood NSW 2210 riverwood@myclinic.com.au
as you can see the data in email column is incomplete and whole links are unusable. Can anyone help me from starting.
updated: the output that I'm getting is
ID Name Suburb State Postcode \
0 1 Hurstville Clinic Hurstville NSW 1493
1 2 Sydney Centre Clinic Sydney NSW 2000
2 3 Auburn Clinic Auburn NSW 2144
3 4 Riverwood Clinic Riverwood NSW 2210
4 5 Fingal Bay Clinic Fingal Bay NSW 2315
5 6 Harrington Clinic Harrington NSW 2427
6 7 Back Forest Clinic Back Forest NSW 2535
7 8 Jindabyne Clinic Jindabyne NSW 2627
8 9 Benolong Clinic Benolong NSW 2830
9 10 Melbourne Centre Clinic Melbourne VIC 3000
Email
0 hurstville@myclinic.com.au
1 sydney@myclinic.com.au
2 auburn@myclinic.com.au
3 riverwood@myclinic.com.au
4 fingal bay@myclinic.com.au
5 harrington@myclinic.com.au
6 back forest@myclinic.com.au
7 jindabyne @myclinic.com.au
8 benolong@myclinic.com.au
9 melbourne@myclinic.com.au
回答1:
I hope this will help assuming you have similar domain for all email ids:
import pandas as pd
df=pd.read_csv("clinic_locations.csv") #Provide complete path to your file
df['Email']=df['Email'].apply(lambda x: x if '@' in str(x) else str(x)+'@myclinic.com.au')
#To see data frame
print(df.head(10))
来源:https://stackoverflow.com/questions/48700691/my-csv-file-have-some-email-addresses-some-of-them-have-incomplete-address-how