python pandas extracting numbers within text to a new column

三世轮回 提交于 2021-02-08 04:44:23

问题


I have the following text in column A:

A   
hellothere_3.43  
hellothere_3.9

I would like to extract only the numbers to another new column B (next to A), e.g:

B                      
3.43   
3.9

I use: str.extract('(\d.\d\d)', expand=True) but this copies only the 3.43 (i.e. the exact number of digits). Is there a way to make it more generic?

Many thanks!


回答1:


Use Regex.

Ex:

import pandas as pd

df = pd.DataFrame({"A": ["hellothere_3.43", "hellothere_3.9"]})
df["B"] = df["A"].str.extract("(\d*\.?\d+)", expand=True)
print(df)

Output:

                 A     B
0  hellothere_3.43  3.43
1   hellothere_3.9   3.9



回答2:


I think string split and apply lambda is quite clean.

import pandas as pd

df = pd.DataFrame({"A": ["hellothere_3.43", "hellothere_3.9"]})
df["B"] = df['A'].str.split('_').apply(lambda x: float(x[1]))

I haven't done any proper comparison, but it seems faster than the regex-solution on small tests.



来源:https://stackoverflow.com/questions/50830059/python-pandas-extracting-numbers-within-text-to-a-new-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!