Finding non-numeric values in varchar column

走远了吗. 提交于 2019-12-11 06:19:14

问题


Requirement :

Generic query/function to check if the value provided in a varchar column in a table is actually a number & the precision does not exceed the allowed precision.

Available values:

Table_Name, Column_Name, Allowed Precision, Allowed Scale

General advise would be to create a function & use to_number() to validate the value however it won't validate the allowed length (precision-scale).

My solution:

Validate Number using Regexp NOT REGEXP_LIKE(COLUMN_NAME, '^-?[0-9.]+$')

Validate Length of left component (before decimal) (I have no idea what's its actually called) because for scale, oracle automatically rounds off if required. As the actual column is varchar i will use substr, instr to find the component on the left of decimal point.

As above Regexp allows number like 123...123124..55 I will also validate the number of decimal points. [If > 1 then error]

Query to find invalid number's:

Select * From Table_Name 
Where
(NOT REGEXP_LIKE(COLUMN_NAME, '^-?[0-9.]+$')
OR
Function_To_Fetch_Left_Component(COLUMN_NAME) > (Precision-Scale)
/* Can use regexp_substr now but i already had a function for that */
OR
LENGTH(Column_Name) - LENGTH(REPLACE(Column_Name,'.','')) > 1
/* Can use regexp_count aswell*/)

I was happy & satisfied with my solution until a column with only '.' value escaped my check and I saw the limitation of my checks. Although adding another check to validate this as well will solve my problem the solution as a whole looks very inefficient to me.

I will really appreciate a better solution [in any way].

Thanks in advance.


回答1:


The precision means that you want at most allowed_precision digits in the number (strictly speaking, not counting leading zeros, but I'll ignore that). The scale means that at most allowed_scale can be after the decimal point.

This suggests a regular expression such as:

[-]?[0-9]{1,<before>}[.]?[0-9]{0,<after>}

You can construct the regular expression:

NOT REGEXP_LIKE(COLUMN_NAME,
                REPLACE(REPLACE('[-]?[0-9]{1,<before>}[.]?[0-9]{0,<after>}', '<before>', allowed_precision - allowed_scale
                               ), '<after>', allowed_scale)

Now, variable regular expressions are highly inefficient. You can do the logic using like and other functions as well. I think the conditions are:

(column_name not like '%.%.%' and
 column_name not like '_%-%' and
 translate(column_name, '0123456789-.x', 'x') is null and
 length(translate(column_name, '-.x', 'x') <= allowed_precision and
 length(translate(column_name, '-.x', 'x') >= 1 and
 instr(translate(column_name, '-.x', 'x'), '.') <= allowed_precision - allowed_scale
)



回答2:


Look for:

  • One-or-more digits optionally followed by a decimal point and zero-or-more digits; or
  • A leading decimal point (no preceding unit digit) and then one or more (decimal) digits.

Like this:

Select *
From   Table_Name 
Where  NOT REGEXP_LIKE(COLUMN_NAME, '^[+-]?(\d+(\.\d*)?|\.\d+)$')

If you do not want zero-padded values in the number string then:

Select *
From   Table_Name 
Where  NOT REGEXP_LIKE(COLUMN_NAME, '^[+-]?(([1-9]\d*|0)(\.\d*)?|\.\d+)$')

With precision and scale (assuming it works as per a NUMBER( precision, scale ) data type and scale < precision):

Select *
From   Table_Name 
Where  NOT REGEXP_LIKE(COLUMN_NAME, '^[+-]?(\d{1,'||(precision-scale)||'}(\.\d{0,'||scale||'})?|\.\d{1,'||scale||'})$')

or, for non-zero-padded numbers with precision and scale:

Select *
From   Table_Name 
Where  NOT REGEXP_LIKE(COLUMN_NAME, '^[+-]?(([1-9]\d{0,'||(precision-scale-1)||'}|0)(\.\d{0,'||scale||'})?|\.\d{1,'||scale||'})$')

or, for any precision and scale:

Select *
From   Table_Name 
Where  NOT REGEXP_LIKE(
             COLUMN_NAME,
             CASE
               WHEN scale <= 0
               THEN '^[+-]?(\d{1,'||precision||'}0{'||(-scale)||'})$'
               WHEN scale < precision
               THEN '^[+-]?(\d{1,'||(precision-scale)||'}(\.\d{0,'||scale||'})?|\.\d{1,'||scale||'})$'
               WHEN scale >= precision
               THEN '^[+-]?(0(\.0{0,'||scale||'})?|0?\.0{'||(scale-precision)||'}\d{1,'||precision||'})$'
             END
           )


来源:https://stackoverflow.com/questions/47711667/finding-non-numeric-values-in-varchar-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!