TypeError: string indices must be integers using pandas apply with lambda

北城以北 提交于 2021-01-19 05:02:30

问题


I have a dataframe, one column is a URL, the other is a name. I'm simply trying to add a third column that takes the URL, and creates an HTML link.

The column newsSource has the Link name, and url has the URL. For each row in the dataframe, I want to create a column that has:

<a href="[the url]">[newsSource name]</a>

Trying the below throws the error

File "C:\Users\AwesomeMan\Documents\Python\MISC\News Alerts\simple_news.py", line 254, in df['sourceURL'] = df['url'].apply(lambda x: '{1}'.format(x, x[0]['newsSource']))
TypeError: string indices must be integers

df['sourceURL'] = df['url'].apply(lambda x: '<a href="{0}">{1}</a>'.format(x, x['source']))

But I've used x[colName] before? The below line works fine, it simply creates a column of the source's name:

df['newsSource'] = df['source'].apply(lambda x: x['name'])

Why suddenly ("suddenly" to me) is it saying I can't access the indices?


回答1:


pd.Series.apply has access only to a single series, i.e. the series on which you are calling the method. In other words, the function you supply, irrespective of whether it is named or an anonymous lambda, will only have access to df['source'].

To access multiple series by row, you need pd.DataFrame.apply along axis=1:

def return_link(x):
    return '<a href="{0}">{1}</a>'.format(x['url'], x['source'])

df['sourceURL'] = df.apply(return_link, axis=1)

Note there is an overhead associated with passing an entire series in this way; pd.DataFrame.apply is just a thinly veiled, inefficient loop.

You may find a list comprehension more efficient:

df['sourceURL'] = ['<a href="{0}">{1}</a>'.format(i, j) \
                   for i, j in zip(df['url'], df['source'])]

Here's a working demo:

df = pd.DataFrame([['BBC', 'http://www.bbc.o.uk']],
                  columns=['source', 'url'])

def return_link(x):
    return '<a href="{0}">{1}</a>'.format(x['url'], x['source'])

df['sourceURL'] = df.apply(return_link, axis=1)

print(df)

  source                  url                              sourceURL
0    BBC  http://www.bbc.o.uk  <a href="http://www.bbc.o.uk">BBC</a>



回答2:


With zip and string old school string format

df['sourceURL'] = ['<a href="%s.">%s.</a>' % (x,y) for x , y in zip (df['url'], df['source'])]

This is f-string

[f'<a href="{x}">{y}</a>' for x , y in zip ((df['url'], df['source'])]


来源:https://stackoverflow.com/questions/51564120/typeerror-string-indices-must-be-integers-using-pandas-apply-with-lambda

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!