问题
I have to clean up some varchar in the following manner:
Remove special characters such as:
!, @, #, $, %, ^, &, *, (, ), }, {, [, ], ",", ., ?, /, ',from a closed list. I've managed to do so with a mass use of replace\regexp_replace but I'm looking for something similar to the one in SQL server.Remove following numbers but not adjacent ones meaning:
round 1 --> roundround1 --> round1round 12345 --> roundround12345 --> round12345Remove words out of a closed list of words such as: "and", "or", "Ltd.", "international" without substrings:
more food or drinks ---> more food drinks. and not --> me food or drinks
I'm working on a big table and I'd like to do it as efficient as possible.
Should I write a function that does that or is there a more elegant way?
回答1:
I'll play along. Here's for question 2:
SELECT trim(regexp_replace(
'12 foo1 run 1457 ABC 1Foo 2456 bar 34',
'\s*\m\d+\M\s*',
' ',
'g'
));
Returns:
foo1 run ABC 1Foo bar
I updated the answer to use constraint expressions instead of the bracket expressions, because the manual informs:
The constraint escapes described below are usually preferable; they are no more standard, but are easier to type.
\s* .. zero or more white space\m .. beginning of a word (same as [[:<:]])\d+ .. 1 or more digits\M .. end of a word (same as [[:>:]])
The 4th parameter 'g' is needed to replace "globally", not just the first occurrence.
->sqlfiddle for v9.2
->sqlfiddle for v8.4 doesn't work
Why?
standard_conforming_strings. The default changed with v9.1.
This works in both worlds, "compatibility mode" so to say. But the syntax of the modern version above (in combination with standard_conforming_strings = on) is cleaner.
SELECT trim(regexp_replace(
'12 foo1 run 1457 ABC 1Foo 2456 bar 34',
E'\\s*\\m\\d+\\M\\s*',
' ',
'g'
));
->sqlfiddle
回答2:
1. It is much better to replace character which are not in allowed list with spaces, like this:
select regexp_replace(
E'aśx€ ąsd, dsa w|adwf\n as dw dgaa[aw] asdd',
'[^a-zA-Z0-9]',
' ',
'g');
This returns
a x sd dsa w adwf as dw dgaa aw asdd
There are thousands of possible characters in Unicode — it isn't really possible to list all special characters.
Taking out multiple consecutive spaces left as exercise to the reader.
回答3:
3. I think the fastest way would be:
select regexp_replace(
'And more food or drinks at the international airport Ltd',
'[[:<:]](and|or|Ltd|international)[[:>:]]',
' ',
'gi'
);
This returns:
more food drinks at the airport
I assume that Ltd. really is Ltd, as a dot is filtered out in first answer.
来源:https://stackoverflow.com/questions/14412635/postgresql-various-clean-up-of-string-varchar