Processing apache logs quickly

前端未结

关注

 5  467

猫巷女王i 2021-01-03 01:21

I\'m currently running an awk script to process a large (8.1GB) access-log file, and it\'s taking forever to finish. In 20 minutes, it wrote 14MB of the (1000 +- 500)MB I ex

5条回答

日久生厌 (楼主)

2021-01-03 02:15
This little Python script handles a ~400MB worth of copies of your example line in about 3 minutes on my machine producing ~200MB of output (keep in mind your sample line was quite short, so that's a handicap):
```
import time

src = open('x.log', 'r')
dest = open('x.csv', 'w')

for line in src:
    ip = line[:line.index(' ')]
    date = line[line.index('[') + 1:line.index(']') - 6]
    t = time.mktime(time.strptime(date, '%d/%b/%Y:%X'))
    dest.write(ip)
    dest.write(',')
    dest.write(str(int(t)))
    dest.write('\n')

src.close()
dest.close()
```
A minor problem is that it doesn't handle timezones (strptime() problem), but you could either hardcode that or add a little extra to take care of it.

But to be honest, something as simple as that should be just as easy to rewrite in C.
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...