REGEXP_EXTRACT in Impala

跟風遠走 提交于 2019-12-25 00:35:12

问题


I am trying to figure out how to extract customer ID from string that looks loke this:

{"param":"success","value":"10","level":"0","error_code":"101","customer_id":"5b0e9b23e423b0d33c9f7ddfd", "purchases": "13", "last_activity_ts": "123523465"}

I am trying to extract customer ID from strings that contain error code 101 with following code:

select regexp_extract(field, '\"customer_id":"(.*)', 0) from table_name
where field rlike '"error_code":"101"'

But this gives me a following result:

"customer_id":"5b0e9b23e423b0d33c9f7ddfd", "purchases": "13", "last_activity_ts": "123523465"}

Expected result:

5b0e9b23e423b0d33c9f7ddfd

Could you please help me with this?


回答1:


You can use below regex:

"customer_id":"([\w\d]+)"

Demo : https://regex101.com/r/MEOGw8/1

Test:

{"param":"success","value":"10","level":"0","error_code":"101","customer_id":"5b0e9b23e423b0d33c9f7ddfd", "purchases": "13", "last_activity_ts": "123523465"}

Match:

Match 1
Full match  63-104  `"customer_id":"5b0e9b23e423b0d33c9f7ddfd"`
Group 1.    78-103  `5b0e9b23e423b0d33c9f7ddfd`

SQL Statement:

select regexp_extract(field, '"customer_id":"([\w\d]+)"',1, 1) from table_name
where field rlike '"error_code":"101"'



回答2:


Your regex matches from "customer_id":" until the end of the line because you use .* which will match any charcter zero or more times and you use 0 as the last parameter of regexp_extract. which refers to the entire extracted string.

To match what is between the double quotes, you could match not a double quote and capture that in a group ([^"]+) using a negated character class:

"customer_id":"([^"]+)"

Or you might specify the character ranges in a character class, repeat it one or more times ([a-f0-9]+) and capture that in a group:

"customer_id":"([a-f0-9]+)"

Your value is in the first capturing group which I think you could specify using 1 as the third parameter for regexp_extract.

regexp_extract(field, '"customer_id":"([a-f0-9]+)"', 1)


来源:https://stackoverflow.com/questions/51100923/regexp-extract-in-impala

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!