How do I re.search or re.match on a whole file without reading it all into memory?

前端 未结 9 1320
不知归路
不知归路 2020-12-01 01:14

I want to be able to run a regular expression on an entire file, but I\'d like to be able to not have to read the whole file into memory at once as I may be working with rat

9条回答
  •  忘掉有多难
    2020-12-01 01:25

    If this is a big deal and worth some effort, you can convert the regular expression into a finite state machine which reads the file. The FSM can be of O(n) complexity which means it will be a lot faster as the file size gets big.

    You will be able to efficiently match patterns that span lines in files too large to fit in memory.

    Here are two places that describe the algorithm for converting a regular expression to a FSM:

    • http://swtch.com/~rsc/regexp/regexp1.html
    • http://www.math.grin.edu/~rebelsky/Courses/CS362/98F/Outlines/outline.07.html

提交回复
热议问题