python re.search not working on multiline string

问题

I have this file loaded in string:

// some preceding stuff
static char header_data[] = {
    1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,
    1,1,1,0,1,1,0,1,1,0,1,1,0,1,1,1,
    1,1,0,1,0,1,0,1,1,0,1,0,1,0,1,1,
    1,0,1,1,1,0,0,1,1,0,0,1,1,1,0,1,
    0,0,0,1,1,1,1,1,1,1,1,1,1,0,1,1,
    1,0,0,0,1,1,0,1,1,1,1,1,0,1,1,1,
    0,1,0,0,0,1,0,0,1,1,1,1,0,0,0,0,
    0,1,1,0,0,0,0,0,0,1,1,1,1,1,1,0,
    0,1,1,1,0,0,0,0,0,0,1,1,0,1,1,0,
    0,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,
    1,1,1,0,1,1,0,0,1,1,0,0,0,1,1,1,
    1,1,0,1,1,1,1,1,1,1,1,0,0,0,1,1,
    1,0,1,1,1,0,0,1,1,0,0,0,0,0,1,1,
    1,1,0,1,0,1,0,1,1,1,1,0,0,0,0,1,
    1,1,1,0,1,1,0,1,1,0,1,1,1,1,0,1,
    1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1
    };

I want to get only the block with ones and zeros, and then somehow process it.

I imported re, and tried:

In [11]: re.search('static char header_data(.*);', src, flags=re.M)

In [12]: re.findall('static char header_data(.*);', src, flags=re.M)
Out[12]: []

Why doesn't it match anything? How to fix this? (It's python3)

回答1:

You need to use the re.S flag, not re.M.

re.M (re.MULTILINE) controls the behavior of ^ and $ (whether they match at the start/end of the entire string or of each line).
re.S (re.DOTALL) controls the behavior of the . and is the option you need when you want to allow the dot to match newlines.

回答2:

and then somehow process it.

Here we go to get a useable list out of the file:

import re
match = re.search(r"static char header_data\[\] = {(.*?)};", src, re.DOTALL)
if match:
    header_data = "".join(match.group(1).split()).split(',')
    print header_data

.*? is a non-greedy match so you really will get just the value between this set of braces.

A more expicit way without DOTALL or MULTILINE would be

match = re.search(r"static char header_data\[\] = {([01,\s\r\n]*?)};", src)

回答3:

If the format of the file does not change, you might as well not resort to re but use slices. Something on these lines could be useful

>>> file_in_string
'\n// some preceding stuff\nstatic char header_data[] = {\n    1,1,1,1,1,1,0,0,0
,0,1,1,1,1,1,1,\n    1,1,1,0,1,1,0,1,1,0,1,1,0,1,1,1,\n    1,1,0,1,0,1,0,1,1,0,1
,0,1,0,1,1,\n    1,0,1,1,1,0,0,1,1,0,0,1,1,1,0,1,\n    0,0,0,1,1,1,1,1,1,1,1,1,1
,0,1,1,\n    1,0,0,0,1,1,0,1,1,1,1,1,0,1,1,1,\n    0,1,0,0,0,1,0,0,1,1,1,1,0,0,0
,0,\n    0,1,1,0,0,0,0,0,0,1,1,1,1,1,1,0,\n    0,1,1,1,0,0,0,0,0,0,1,1,0,1,1,0,\
n    0,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,\n    1,1,1,0,1,1,0,0,1,1,0,0,0,1,1,1,\n
 1,1,0,1,1,1,1,1,1,1,1,0,0,0,1,1,\n    1,0,1,1,1,0,0,1,1,0,0,0,0,0,1,1,\n    1,1
,0,1,0,1,0,1,1,1,1,0,0,0,0,1,\n    1,1,1,0,1,1,0,1,1,0,1,1,1,1,0,1,\n    1,1,1,1
,1,1,0,0,0,0,1,1,1,1,1,1\n    };\n'
>>> lines = file_in_string.split()
>>> lines[9:-1]
['1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,', '1,1,1,0,1,1,0,1,1,0,1,1,0,1,1,1,', '1,1,0,
1,0,1,0,1,1,0,1,0,1,0,1,1,', '1,0,1,1,1,0,0,1,1,0,0,1,1,1,0,1,', '0,0,0,1,1,1,1,
1,1,1,1,1,1,0,1,1,', '1,0,0,0,1,1,0,1,1,1,1,1,0,1,1,1,', '0,1,0,0,0,1,0,0,1,1,1,
1,0,0,0,0,', '0,1,1,0,0,0,0,0,0,1,1,1,1,1,1,0,', '0,1,1,1,0,0,0,0,0,0,1,1,0,1,1,
0,', '0,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,', '1,1,1,0,1,1,0,0,1,1,0,0,0,1,1,1,', '1,
1,0,1,1,1,1,1,1,1,1,0,0,0,1,1,', '1,0,1,1,1,0,0,1,1,0,0,0,0,0,1,1,', '1,1,0,1,0,
1,0,1,1,1,1,0,0,0,0,1,', '1,1,1,0,1,1,0,1,1,0,1,1,1,1,0,1,', '1,1,1,1,1,1,0,0,0,
0,1,1,1,1,1,1']

来源：https://stackoverflow.com/questions/27250171/python-re-search-not-working-on-multiline-string

标签

python

regex

multiline