python re.search not working on multiline string

南笙酒味 提交于 2019-12-11 07:24:12

问题


I have this file loaded in string:

// some preceding stuff
static char header_data[] = {
    1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,
    1,1,1,0,1,1,0,1,1,0,1,1,0,1,1,1,
    1,1,0,1,0,1,0,1,1,0,1,0,1,0,1,1,
    1,0,1,1,1,0,0,1,1,0,0,1,1,1,0,1,
    0,0,0,1,1,1,1,1,1,1,1,1,1,0,1,1,
    1,0,0,0,1,1,0,1,1,1,1,1,0,1,1,1,
    0,1,0,0,0,1,0,0,1,1,1,1,0,0,0,0,
    0,1,1,0,0,0,0,0,0,1,1,1,1,1,1,0,
    0,1,1,1,0,0,0,0,0,0,1,1,0,1,1,0,
    0,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,
    1,1,1,0,1,1,0,0,1,1,0,0,0,1,1,1,
    1,1,0,1,1,1,1,1,1,1,1,0,0,0,1,1,
    1,0,1,1,1,0,0,1,1,0,0,0,0,0,1,1,
    1,1,0,1,0,1,0,1,1,1,1,0,0,0,0,1,
    1,1,1,0,1,1,0,1,1,0,1,1,1,1,0,1,
    1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1
    };

I want to get only the block with ones and zeros, and then somehow process it.

I imported re, and tried:

In [11]: re.search('static char header_data(.*);', src, flags=re.M)

In [12]: re.findall('static char header_data(.*);', src, flags=re.M)
Out[12]: []

Why doesn't it match anything? How to fix this? (It's python3)


回答1:


You need to use the re.S flag, not re.M.

  • re.M (re.MULTILINE) controls the behavior of ^ and $ (whether they match at the start/end of the entire string or of each line).
  • re.S (re.DOTALL) controls the behavior of the . and is the option you need when you want to allow the dot to match newlines.

See also the documentation.




回答2:


and then somehow process it.

Here we go to get a useable list out of the file:

import re
match = re.search(r"static char header_data\[\] = {(.*?)};", src, re.DOTALL)
if match:
    header_data = "".join(match.group(1).split()).split(',')
    print header_data

.*? is a non-greedy match so you really will get just the value between this set of braces.

A more expicit way without DOTALL or MULTILINE would be

match = re.search(r"static char header_data\[\] = {([01,\s\r\n]*?)};", src)



回答3:


If the format of the file does not change, you might as well not resort to re but use slices. Something on these lines could be useful

>>> file_in_string
'\n// some preceding stuff\nstatic char header_data[] = {\n    1,1,1,1,1,1,0,0,0
,0,1,1,1,1,1,1,\n    1,1,1,0,1,1,0,1,1,0,1,1,0,1,1,1,\n    1,1,0,1,0,1,0,1,1,0,1
,0,1,0,1,1,\n    1,0,1,1,1,0,0,1,1,0,0,1,1,1,0,1,\n    0,0,0,1,1,1,1,1,1,1,1,1,1
,0,1,1,\n    1,0,0,0,1,1,0,1,1,1,1,1,0,1,1,1,\n    0,1,0,0,0,1,0,0,1,1,1,1,0,0,0
,0,\n    0,1,1,0,0,0,0,0,0,1,1,1,1,1,1,0,\n    0,1,1,1,0,0,0,0,0,0,1,1,0,1,1,0,\
n    0,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,\n    1,1,1,0,1,1,0,0,1,1,0,0,0,1,1,1,\n
 1,1,0,1,1,1,1,1,1,1,1,0,0,0,1,1,\n    1,0,1,1,1,0,0,1,1,0,0,0,0,0,1,1,\n    1,1
,0,1,0,1,0,1,1,1,1,0,0,0,0,1,\n    1,1,1,0,1,1,0,1,1,0,1,1,1,1,0,1,\n    1,1,1,1
,1,1,0,0,0,0,1,1,1,1,1,1\n    };\n'
>>> lines = file_in_string.split()
>>> lines[9:-1]
['1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,', '1,1,1,0,1,1,0,1,1,0,1,1,0,1,1,1,', '1,1,0,
1,0,1,0,1,1,0,1,0,1,0,1,1,', '1,0,1,1,1,0,0,1,1,0,0,1,1,1,0,1,', '0,0,0,1,1,1,1,
1,1,1,1,1,1,0,1,1,', '1,0,0,0,1,1,0,1,1,1,1,1,0,1,1,1,', '0,1,0,0,0,1,0,0,1,1,1,
1,0,0,0,0,', '0,1,1,0,0,0,0,0,0,1,1,1,1,1,1,0,', '0,1,1,1,0,0,0,0,0,0,1,1,0,1,1,
0,', '0,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,', '1,1,1,0,1,1,0,0,1,1,0,0,0,1,1,1,', '1,
1,0,1,1,1,1,1,1,1,1,0,0,0,1,1,', '1,0,1,1,1,0,0,1,1,0,0,0,0,0,1,1,', '1,1,0,1,0,
1,0,1,1,1,1,0,0,0,0,1,', '1,1,1,0,1,1,0,1,1,0,1,1,1,1,0,1,', '1,1,1,1,1,1,0,0,0,
0,1,1,1,1,1,1']


来源:https://stackoverflow.com/questions/27250171/python-re-search-not-working-on-multiline-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!