问题
Given the following text, I want to remove everything in data_augmentation_options{..}
i.e., input is :
batch_size: 4
num_steps: 30
data_augmentation_options {
random_horizontal_flip {
keypoint_flip_permutation: 0
keypoint_flip_permutation: 2
keypoint_flip_permutation: 1
keypoint_flip_permutation: 4
keypoint_flip_permutation: 3
keypoint_flip_permutation: 6
keypoint_flip_permutation: 5
keypoint_flip_permutation: 8
keypoint_flip_permutation: 7
keypoint_flip_permutation: 10
keypoint_flip_permutation: 9
keypoint_flip_permutation: 12
keypoint_flip_permutation: 11
keypoint_flip_permutation: 14
keypoint_flip_permutation: 13
keypoint_flip_permutation: 16
keypoint_flip_permutation: 15
}
}
data_augmentation_options {
random_crop_image {
min_aspect_ratio: 0.5
max_aspect_ratio: 1.7
random_coef: 0.25
}
}
expected output is:
batch_size: 4
num_steps: 30
I tried
s='''
batch_size: 4
num_steps: 30
data_augmentation_options {
random_horizontal_flip {
keypoint_flip_permutation: 0
keypoint_flip_permutation: 2
keypoint_flip_permutation: 1
keypoint_flip_permutation: 4
keypoint_flip_permutation: 3
keypoint_flip_permutation: 6
keypoint_flip_permutation: 5
keypoint_flip_permutation: 8
keypoint_flip_permutation: 7
keypoint_flip_permutation: 10
keypoint_flip_permutation: 9
keypoint_flip_permutation: 12
keypoint_flip_permutation: 11
keypoint_flip_permutation: 14
keypoint_flip_permutation: 13
keypoint_flip_permutation: 16
keypoint_flip_permutation: 15
}
}
data_augmentation_options {
random_crop_image {
min_aspect_ratio: 0.5
max_aspect_ratio: 1.7
random_coef: 0.25
}
}
'''
print(re.sub('data_augmentation_options \{*\}','',s,flags=re.S))
It does not seem to work, what's the right way to achieve this?
回答1:
Rather than deleting what you don't want you could capture what you do want:
>>> re.findall(r'batch_size: *\d+|num_steps: *\d+',s)
['batch_size: 4', 'num_steps: 30']
Or if you want to capture the leading spaces:
>>> re.findall(r'^[ \t]*(?:batch_size:|num_steps:)[ \t]*\d+',s, flags=re.M)
['\t\t\tbatch_size: 4', '\t\t\tnum_steps: 30']
Then print the result:
>>> print('\n'.join(re.findall(r'^[ \t]*(?:batch_size:|num_steps:)[ \t]*\d+',s, flags=re.M))
batch_size: 4
num_steps: 30
If you want to use re.sub you can use a conflicted character class that will match any and all characters after the match. A conflicted character class is something like [\s\S] which is a space or non-space character:
>>> re.sub(r'data_augmentation_options[\s\S]*','',s)
batch_size: 4
num_steps: 30
Perhaps even easier is to just use Python's str.partition with the string that you want to use as a separator:
>>> s.partition('data_augmentation_options')[0]
batch_size: 4
num_steps: 30
回答2:
This will work:
s = re.sub("\W+data_augmentation_options {(?:.|\n)*}", "", s).strip()
来源:https://stackoverflow.com/questions/65479201/replace-multiple-lines-with-regex