Pandas string extract all the matches

旧城冷巷雨未停 提交于 2021-02-16 14:39:25

问题


I am learning regex operation in pandas series string method. I was able to extract the first number from the string, but my regex is not matching the second number. How to capture both the numbers?

Note that second row, the second element is NAN here.

CODE:

import pandas as pd
df = pd.DataFrame({'a': ["number 1.23 has 1.2 ",
                         "number 12.2 has 12 "]})

pat = r""".+\s+
(\d+\.\d+)
.+
((?:\d+\.\d+)?)
.+"""


df['a'].str.extract(pat,flags=re.X,expand=True)

Gives:

0      1
1.23
12.2

Expected:

0    1
1.23 1.2
12.2 NaN

How to fix the regex?

I am very new to regex, so please be considerate and forgive my ignorance.


回答1:


You may use .str.findall with the \d+\.\d+ regex:

>>> df['a'].str.findall(r"\d+\.\d+").to_frame()
             a
0  [1.23, 1.2]
1       [12.2]

Or,

>>> pd.DataFrame(df['a'].str.findall(r"\d+\.\d+").tolist())
      0     1
0  1.23   1.2
1  12.2  None

The pattern matches

  • \d+ - 1+ digits
  • \. - dot
  • \d+ - 1+ digits.

Note that str.findall does not require the whole pattern to be wrapped with a capturing group, as is the case with .str.extractall that could also be used here.



来源:https://stackoverflow.com/questions/56064849/pandas-string-extract-all-the-matches

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!