问题
Requirement :
Generic query/function to check if the value provided in a varchar column in a table is actually a number & the precision does not exceed the allowed precision.
Available values:
Table_Name, Column_Name, Allowed Precision, Allowed Scale
General advise would be to create a function & use to_number() to validate the value however it won't validate the allowed length (precision-scale).
My solution:
Validate Number using Regexp NOT REGEXP_LIKE(COLUMN_NAME, '^-?[0-9.]+$')
Validate Length of left component (before decimal) (I have no idea what's its actually called) because for scale, oracle automatically rounds off if required. As the actual column is varchar i will use substr, instr to find the component on the left of decimal point.
As above Regexp allows number like 123...123124..55 I will also validate the number of decimal points. [If > 1 then error]
Query to find invalid number's:
Select * From Table_Name
Where
(NOT REGEXP_LIKE(COLUMN_NAME, '^-?[0-9.]+$')
OR
Function_To_Fetch_Left_Component(COLUMN_NAME) > (Precision-Scale)
/* Can use regexp_substr now but i already had a function for that */
OR
LENGTH(Column_Name) - LENGTH(REPLACE(Column_Name,'.','')) > 1
/* Can use regexp_count aswell*/)
I was happy & satisfied with my solution until a column with only '.' value escaped my check and I saw the limitation of my checks. Although adding another check to validate this as well will solve my problem the solution as a whole looks very inefficient to me.
I will really appreciate a better solution [in any way].
Thanks in advance.
回答1:
The precision means that you want at most allowed_precision digits in the number (strictly speaking, not counting leading zeros, but I'll ignore that). The scale means that at most allowed_scale can be after the decimal point.
This suggests a regular expression such as:
[-]?[0-9]{1,<before>}[.]?[0-9]{0,<after>}
You can construct the regular expression:
NOT REGEXP_LIKE(COLUMN_NAME,
REPLACE(REPLACE('[-]?[0-9]{1,<before>}[.]?[0-9]{0,<after>}', '<before>', allowed_precision - allowed_scale
), '<after>', allowed_scale)
Now, variable regular expressions are highly inefficient. You can do the logic using like and other functions as well. I think the conditions are:
(column_name not like '%.%.%' and
column_name not like '_%-%' and
translate(column_name, '0123456789-.x', 'x') is null and
length(translate(column_name, '-.x', 'x') <= allowed_precision and
length(translate(column_name, '-.x', 'x') >= 1 and
instr(translate(column_name, '-.x', 'x'), '.') <= allowed_precision - allowed_scale
)
回答2:
Look for:
- One-or-more digits optionally followed by a decimal point and zero-or-more digits; or
- A leading decimal point (no preceding unit digit) and then one or more (decimal) digits.
Like this:
Select *
From Table_Name
Where NOT REGEXP_LIKE(COLUMN_NAME, '^[+-]?(\d+(\.\d*)?|\.\d+)$')
If you do not want zero-padded values in the number string then:
Select *
From Table_Name
Where NOT REGEXP_LIKE(COLUMN_NAME, '^[+-]?(([1-9]\d*|0)(\.\d*)?|\.\d+)$')
With precision and scale (assuming it works as per a NUMBER( precision, scale ) data type and scale < precision):
Select *
From Table_Name
Where NOT REGEXP_LIKE(COLUMN_NAME, '^[+-]?(\d{1,'||(precision-scale)||'}(\.\d{0,'||scale||'})?|\.\d{1,'||scale||'})$')
or, for non-zero-padded numbers with precision and scale:
Select *
From Table_Name
Where NOT REGEXP_LIKE(COLUMN_NAME, '^[+-]?(([1-9]\d{0,'||(precision-scale-1)||'}|0)(\.\d{0,'||scale||'})?|\.\d{1,'||scale||'})$')
or, for any precision and scale:
Select *
From Table_Name
Where NOT REGEXP_LIKE(
COLUMN_NAME,
CASE
WHEN scale <= 0
THEN '^[+-]?(\d{1,'||precision||'}0{'||(-scale)||'})$'
WHEN scale < precision
THEN '^[+-]?(\d{1,'||(precision-scale)||'}(\.\d{0,'||scale||'})?|\.\d{1,'||scale||'})$'
WHEN scale >= precision
THEN '^[+-]?(0(\.0{0,'||scale||'})?|0?\.0{'||(scale-precision)||'}\d{1,'||precision||'})$'
END
)
来源:https://stackoverflow.com/questions/47711667/finding-non-numeric-values-in-varchar-column