pandas ValueError: pattern contains no capture groups

后端未结

关注

 3  1344

When using regular expression, I get:

import re
string = r\'http://www.example.com/abc.html\'
result = re.search(\'^.*com\', string).group()

相关标签:

3条回答

你的背包

2020-12-10 14:06

Try this python library, works well for this purpose:

Using urllib.parse

from urllib.parse import urlparse
df['domain']=df.url.apply(lambda x:urlparse(x).netloc)
print(df)

  index                              url           domain
0     1  http://www.example.com/abc.html  www.example.com
1     2    http://www.hello.com/def.html    www.hello.com

0 讨论(0)

余生分开走

2020-12-10 14:14
According to the docs, you need to specify a capture group (i.e., parentheses) for str.extract to, well, extract.

Series.str.extract(pat, flags=0, expand=True)
For each subject string in the Series, extract groups from the first match of regular expression pat.

Each capture group constitutes its own column in the output.
```
df.url.str.extract(r'(.*.com)')

                        0
0  http://www.example.com
1    http://www.hello.com
```
```
# If you need named capture groups,
df.url.str.extract(r'(?P<URL>.*.com)')

                      URL
0  http://www.example.com
1    http://www.hello.com
```
Or, if you need a Series,
```
df.url.str.extract(r'(.*.com)', expand=False)

0    http://www.example.com
1      http://www.hello.com
Name: url, dtype: object
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

梦如初夏

2020-12-10 14:31

You need specify column url with () for match groups:

df['new'] = df['url'].str.extract(r'(^.*com)')
print (df)
  index                              url                     new
0     1  http://www.example.com/abc.html  http://www.example.com
1     2    http://www.hello.com/def.html    http://www.hello.com

0 讨论(0)