regex for access log in hive serde

前端 未结 3 1794
生来不讨喜
生来不讨喜 2020-12-06 19:16

I want to extract out (ip, requestUrl, timeStamp) from the access logs to load to hive database. One line from access log is as follows.


66.249.68.6 - -          


        
3条回答
  •  执念已碎
    2020-12-06 19:50

    Use double '\' and '.*' in the end (it's important!):

    CREATE EXTERNAL TABLE access_log (
            `ip`                STRING,
            `time_local`        STRING,
            `method`            STRING,
            `uri`               STRING,
            `protocol`          STRING,
            `status`            STRING,
            `bytes_sent`        STRING,
            `referer`           STRING,
            `useragent`         STRING
            )
        ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
        WITH SERDEPROPERTIES (
        'input.regex'='^(\\S+) \\S+ \\S+ \\[([^\\[]+)\\] "(\\w+) (\\S+) (\\S+)" (\\d+) (\\d+) "([^"]+)" "([^"]+)".*'
    )
    STORED AS TEXTFILE
    LOCATION '/tmp/access_logs/';
    

    P.S. Hive 0.7.1

提交回复
热议问题