问题
Given the following text, I want to remove everything in data_augmentation_options{random_horizontal_flip {..}}
(... means other text in the following)
i.e., input is :
...
batch_size: 4
num_steps: 30
data_augmentation_options {
random_horizontal_flip {
keypoint_flip_permutation: 0
keypoint_flip_permutation: 2
keypoint_flip_permutation: 1
keypoint_flip_permutation: 4
keypoint_flip_permutation: 3
keypoint_flip_permutation: 6
keypoint_flip_permutation: 5
keypoint_flip_permutation: 8
keypoint_flip_permutation: 7
keypoint_flip_permutation: 10
keypoint_flip_permutation: 9
keypoint_flip_permutation: 12
keypoint_flip_permutation: 11
keypoint_flip_permutation: 14
keypoint_flip_permutation: 13
keypoint_flip_permutation: 16
keypoint_flip_permutation: 15
}
}
data_augmentation_options {
random_crop_image {
min_aspect_ratio: 0.5
max_aspect_ratio: 1.7
random_coef: 0.25
}
}
...
expected output is:
...
batch_size: 4
num_steps: 30
data_augmentation_options {
random_crop_image {
min_aspect_ratio: 0.5
max_aspect_ratio: 1.7
random_coef: 0.25
}
}
...
I tried
s=''' ...
batch_size: 4
num_steps: 30
data_augmentation_options {
random_horizontal_flip {
keypoint_flip_permutation: 0
keypoint_flip_permutation: 2
keypoint_flip_permutation: 1
keypoint_flip_permutation: 4
keypoint_flip_permutation: 3
keypoint_flip_permutation: 6
keypoint_flip_permutation: 5
keypoint_flip_permutation: 8
keypoint_flip_permutation: 7
keypoint_flip_permutation: 10
keypoint_flip_permutation: 9
keypoint_flip_permutation: 12
keypoint_flip_permutation: 11
keypoint_flip_permutation: 14
keypoint_flip_permutation: 13
keypoint_flip_permutation: 16
keypoint_flip_permutation: 15
}
}
data_augmentation_options {
random_crop_image {
min_aspect_ratio: 0.5
max_aspect_ratio: 1.7
random_coef: 0.25
}
}
...
'''
print(re.sub('data_augmentation_options \{[\s]+random_horizontal_flip[\s]+\{[\s]+(keypoint_flip_permutation: \d[\s])+[\s]+\}[\s]+\}','',s,flags=re.S))
It does not seem to work, what's the right way to achieve this?
回答1:
You are only matching a sinlge line instead of all the lines.
You can repeat the lines for this format keypoint_flip_permutation: \d+ and match the 2 closing curly's
Note that you don't need re.S as there is no dot in the pattern.
data_augmentation_options {\s+random_horizontal_flip\s+{(?:\s+keypoint_flip_permutation: \d+)+\s*}\s*}\s*
Explanation
data_augmentation_options {Match literally\s+random_horizontal_flip\s+match the starting line{Match literally(?:Non capture group\s+keypoint_flip_permutation: \d+Match the string string followed by 1+ digits
)+Repeat 1+ times\s*}Match optional whitespace chars and}\s*}Match optional whitespace chars and}\s*Match optional whitespace chars
If you want to remove only the trailing newline, you can match \r?\n at the end instead of \s*
Regex demo | Python demo
for example
print(re.sub(r"data_augmentation_options {\s+random_horizontal_flip\s+{(?:\s+keypoint_flip_permutation: \d+)+\s*}\s*}\s*", "", s))
回答2:
A few modification to your regex to become
data_augmentation_options {\s+random_horizontal_flip\s+{(\s+keypoint_flip_permutation:\s\d+\s)+\s+}\s+}
- replace
[\s]by just\s, which is equivalent - put the
\s+inside the capture group() - replace
\dby\d+to match multi-digits numbers
回答3:
You can use a lookahead to stop the deletion at the second pattern:
>>> re.sub(r'^[ \t]*data_augmentation_options[\s\S]+?(?=^[ \t]*data_augmentation_options)','\n\n',s, flags=re.M)
batch_size: 4
num_steps: 30
data_augmentation_options {
random_crop_image {
min_aspect_ratio: 0.5
max_aspect_ratio: 1.7
random_coef: 0.25
}
}
The (?=^[ \t]*data_augmentation_options) is the lookahead.
Regex Demo
来源:https://stackoverflow.com/questions/65479947/replace-with-multi-line-regex