问题
I am trying to figure out how to extract customer ID from string that looks loke this:
{"param":"success","value":"10","level":"0","error_code":"101","customer_id":"5b0e9b23e423b0d33c9f7ddfd", "purchases": "13", "last_activity_ts": "123523465"}
I am trying to extract customer ID from strings that contain error code 101 with following code:
select regexp_extract(field, '\"customer_id":"(.*)', 0) from table_name
where field rlike '"error_code":"101"'
But this gives me a following result:
"customer_id":"5b0e9b23e423b0d33c9f7ddfd", "purchases": "13", "last_activity_ts": "123523465"}
Expected result:
5b0e9b23e423b0d33c9f7ddfd
Could you please help me with this?
回答1:
You can use below regex:
"customer_id":"([\w\d]+)"
Demo : https://regex101.com/r/MEOGw8/1
Test:
{"param":"success","value":"10","level":"0","error_code":"101","customer_id":"5b0e9b23e423b0d33c9f7ddfd", "purchases": "13", "last_activity_ts": "123523465"}
Match:
Match 1
Full match 63-104 `"customer_id":"5b0e9b23e423b0d33c9f7ddfd"`
Group 1. 78-103 `5b0e9b23e423b0d33c9f7ddfd`
SQL Statement:
select regexp_extract(field, '"customer_id":"([\w\d]+)"',1, 1) from table_name
where field rlike '"error_code":"101"'
回答2:
Your regex matches from "customer_id":"
until the end of the line because you use .*
which will match any charcter zero or more times and you use 0
as the last parameter of regexp_extract.
which refers to the entire extracted string.
To match what is between the double quotes, you could match not a double quote and capture that in a group ([^"]+)
using a negated character class:
"customer_id":"([^"]+)"
Or you might specify the character ranges in a character class, repeat it one or more times ([a-f0-9]+)
and capture that in a group:
"customer_id":"([a-f0-9]+)"
Your value is in the first capturing group which I think you could specify using 1
as the third parameter for regexp_extract.
regexp_extract(field, '"customer_id":"([a-f0-9]+)"', 1)
来源:https://stackoverflow.com/questions/51100923/regexp-extract-in-impala