Parse Apache log in PHP using preg_match

后端 未结 5 1084
日久生厌
日久生厌 2020-12-23 14:34

I need to save data in a table (for reporting, stats etc...) so a user can search by time, user agent etc. I have a script that runs every day that reads the Apache Log and

5条回答
  •  刺人心
    刺人心 (楼主)
    2020-12-23 14:51

    As I've seen and done so many errneous log parsing, here is a hopefully valid regex, tested on 50k lines of logs without any single diff, knowing that:

    • auth_user can have spaces
    • response_size can be -
    • http_start_line can at least one space (HTTP/0.9) or two
    • http_start_line may contain double quotes
    • referrer can be empty, have spaces, or double quotes (it's just an HTTP header)
    • user_agent can be empty too, or contain double quotes, and spaces
    • It's hard to distinguish between referrer and user-agent, let's just home the " " between both is discriminent enough, yet we can find the infamous " " in the referrer and in the user-agent, so basically, we're screwed here.

      $ncsa_re = '/^(?P\S+)
      \ (?P\S)
      \ (?P.*?) # Spaces are allowed here, can be empty.
      \ (?P\[[^]]+\])
      \ "(?P.+ .+)" # At least one space: HTTP 0.9
      \ (?P[0-9]+) # Status code is _always_ an integer
      \ (?P(?:[0-9]+|-)) # Response size can be -
      \ "(?P.*)" # Referrer can contains everything: its just a header
      \ "(?P.*)"$/x';
      

    Hope that's help.

提交回复
热议问题