I have a dataframe, one column is a URL, the other is a name. I\'m simply trying to add a third column that takes the URL, and creates an HTML link.
The column
With zip and string old school string format
df['sourceURL'] = ['<a href="%s.">%s.</a>' % (x,y) for x , y in zip (df['url'], df['source'])]
This is f-string
[f'<a href="{x}">{y}</a>' for x , y in zip ((df['url'], df['source'])]
pd.Series.apply has access only to a single series, i.e. the series on which you are calling the method. In other words, the function you supply, irrespective of whether it is named or an anonymous lambda, will only have access to df['source'].
To access multiple series by row, you need pd.DataFrame.apply along axis=1:
def return_link(x):
return '<a href="{0}">{1}</a>'.format(x['url'], x['source'])
df['sourceURL'] = df.apply(return_link, axis=1)
Note there is an overhead associated with passing an entire series in this way; pd.DataFrame.apply is just a thinly veiled, inefficient loop.
You may find a list comprehension more efficient:
df['sourceURL'] = ['<a href="{0}">{1}</a>'.format(i, j) \
for i, j in zip(df['url'], df['source'])]
Here's a working demo:
df = pd.DataFrame([['BBC', 'http://www.bbc.o.uk']],
columns=['source', 'url'])
def return_link(x):
return '<a href="{0}">{1}</a>'.format(x['url'], x['source'])
df['sourceURL'] = df.apply(return_link, axis=1)
print(df)
source url sourceURL
0 BBC http://www.bbc.o.uk <a href="http://www.bbc.o.uk">BBC</a>