pandas ValueError: pattern contains no capture groups

后端 未结 3 1344
迷失自我
迷失自我 2020-12-10 13:59

When using regular expression, I get:

import re
string = r\'http://www.example.com/abc.html\'
result = re.search(\'^.*com\', string).group()
<
相关标签:
3条回答
  • 2020-12-10 14:06

    Try this python library, works well for this purpose:

    Using urllib.parse

    from urllib.parse import urlparse
    df['domain']=df.url.apply(lambda x:urlparse(x).netloc)
    print(df)
    
      index                              url           domain
    0     1  http://www.example.com/abc.html  www.example.com
    1     2    http://www.hello.com/def.html    www.hello.com
    
    0 讨论(0)
  • 2020-12-10 14:14

    According to the docs, you need to specify a capture group (i.e., parentheses) for str.extract to, well, extract.

    Series.str.extract(pat, flags=0, expand=True)
    For each subject string in the Series, extract groups from the first match of regular expression pat.

    Each capture group constitutes its own column in the output.

    df.url.str.extract(r'(.*.com)')
    
                            0
    0  http://www.example.com
    1    http://www.hello.com
    

    # If you need named capture groups,
    df.url.str.extract(r'(?P<URL>.*.com)')
    
                          URL
    0  http://www.example.com
    1    http://www.hello.com
    

    Or, if you need a Series,

    df.url.str.extract(r'(.*.com)', expand=False)
    
    0    http://www.example.com
    1      http://www.hello.com
    Name: url, dtype: object
    
    0 讨论(0)
  • 2020-12-10 14:31

    You need specify column url with () for match groups:

    df['new'] = df['url'].str.extract(r'(^.*com)')
    print (df)
      index                              url                     new
    0     1  http://www.example.com/abc.html  http://www.example.com
    1     2    http://www.hello.com/def.html    http://www.hello.com
    
    0 讨论(0)
提交回复
热议问题