I am pretty new to Python and I am trying to cleanse some data. I\'ve attached a link to the data file (Two tabs: Raw data and desired outcome). Please help!
What I am t
Try this:
1.)Delete Row 1-23
df = pd.read_excel('/home/mayankp/Downloads/Example2.xlsx', sheet_name=0, index_col=None, header=None, skiprows=23)
2.) Split Column B into multiple columns using '-' as a delimiter and 3.)Assign Column names to the new columns
Both these steps can be done in 1 go:
sub_df = df[1].str.split('-', expand=True).rename(columns = lambda x: "string"+str(x+1))
In [179]: sub_df
Out[179]:
string1 string2 string3 string4 string5
1 us campaign article1 scrolldown findoutnow
2 us campaign article1 scrollright None
3 us campaign article1 findoutnow None
4 us campaign payablesmanagement findoutnow None
Above is how the sample looks like after splitting on -
.
Now drop the actual column from df
and insert these new columns in it:
df = df.drop(1, axis=1)
df = pd.concat([df,sub_df], axis=1)
4.)Keep the numeric columns
Remaining columns are already intact. No change needed for this.
Let me know if this helps.