Regex to parse international floating-point numbers

≡放荡痞女 提交于 2019-12-18 04:56:05

问题


I need a regex to get numeric values that can be

111.111,11

111,111.11

111,111

And separate the integer and decimal portions so I can store in a DB with the correct syntax

I tried ([0-9]{1,3}[,.]?)+([,.][0-9]{2})? With no success since it doesn't detect the second part :(

The result should look like:

111.111,11 -> $1 = 111111; $2 = 11

回答1:


First Answer:

This matches #,###,##0.00:

^[+-]?[0-9]{1,3}(?:\,?[0-9]{3})*(?:\.[0-9]{2})?$

And this matches #.###.##0,00:

^[+-]?[0-9]{1,3}(?:\.?[0-9]{3})*(?:\,[0-9]{2})?$

Joining the two (there are smarter/shorter ways to write it, but it works):

(?:^[+-]?[0-9]{1,3}(?:\,?[0-9]{3})*(?:\.[0-9]{2})?$)
|(?:^[+-]?[0-9]{1,3}(?:\.?[0-9]{3})*(?:\,[0-9]{2})?$)

You can also, add a capturing group to the last comma (or dot) to check which one was used.


Second Answer:

As pointed by Alan M, my previous solution could fail to reject a value like 11,111111.00 where a comma is missing, but the other isn't. After some tests I reached the following regex that avoids this problem:

^[+-]?[0-9]{1,3}
(?:(?<comma>\,?)[0-9]{3})?
(?:\k<comma>[0-9]{3})*
(?:\.[0-9]{2})?$

This deserves some explanation:

  • ^[+-]?[0-9]{1,3} matches the first (1 to 3) digits;

  • (?:(?<comma>\,?)[0-9]{3})? matches on optional comma followed by more 3 digits, and captures the comma (or the inexistence of one) in a group called 'comma';

  • (?:\k<comma>[0-9]{3})* matches zero-to-any repetitions of the comma used before (if any) followed by 3 digits;

  • (?:\.[0-9]{2})?$ matches optional "cents" at the end of the string.

Of course, that will only cover #,###,##0.00 (not #.###.##0,00), but you can always join the regexes like I did above.


Final Answer:

Now, a complete solution. Indentations and line breaks are there for readability only.

^[+-]?[0-9]{1,3}
(?:
    (?:\,[0-9]{3})*
    (?:.[0-9]{2})?
|
    (?:\.[0-9]{3})*
    (?:\,[0-9]{2})?
|
    [0-9]*
    (?:[\.\,][0-9]{2})?
)$

And this variation captures the separators used:

^[+-]?[0-9]{1,3}
(?:
    (?:(?<thousand>\,)[0-9]{3})*
    (?:(?<decimal>\.)[0-9]{2})?
|
    (?:(?<thousand>\.)[0-9]{3})*
    (?:(?<decimal>\,)[0-9]{2})?
|
    [0-9]*
    (?:(?<decimal>[\.\,])[0-9]{2})?
)$

edit 1: "cents" are now optional; edit 2: text added; edit 3: second solution added; edit 4: complete solution added; edit 5: headings added; edit 6: capturing added; edit 7: last answer broke in two versions;




回答2:


I would at first use this regex to determine wether a comma or a dot is used as a comma delimiter (It fetches the last of the two):

[0-9,\.]*([,\.])[0-9]*

I would then strip all of the other sign (which the previous didn't match). If there were no matches, you already have an integer and can skip the next steps. The removal of the chosen sign can easily be done with a regex, but there are also many other functions which can do this faster/better.

You are then left with a number in the form of an integer possible followed by a comma or a dot and then the decimals, where the integer- and decimal-part easily can be separated from eachother with the following regex.

([0-9]+)[,\.]?([0-9]*)

Good luck!

Edit:

Here is an example made in python, I assume the code should be self-explaining, if it is not, just ask.

import re

input = str(raw_input())
delimiterRegex = re.compile('[0-9,\.]*([,\.])[0-9]*')
splitRegex = re.compile('([0-9]+)[,\.]?([0-9]*)')

delimiter = re.findall(delimiterRegex, input)

if (delimiter[0] == ','):
    input = re.sub('[\.]*','', input)
elif (delimiter[0] == '.'):
    input = re.sub('[,]*','', input)

print input

With this code, the following inputs gives this:

  • 111.111,11

    111111,11

  • 111,111.11

    111111.11

  • 111,111

    111,111

After this step, one can now easily modify the string to match your needs.




回答3:


How about

/(\d{1,3}(?:,\d{3})*)(\.\d{2})?/

if you care about validating that the commas separate every 3 digits exactly, or

/(\d[\d,]*)(\.\d{2})?/

if you don't.




回答4:


If I'm interpreting your question correctly so that you are saying the result SHOULD look like what you say is "would" look like, then I think you just need to leave the comma out of the character class, since it is used as a separator and not a part of what is to be matched.

So get rid of the "." first, then match the two parts.

$value = "111,111.11";
$value =~ s/\.//g;
$value =~ m/(\d+)(?:,(\d+))?/;

$1 = leading integers with periods removed $2 = either undef if it didn't exist, or the post-comma digits if they do exist.




回答5:


See Perl's Regexp::Common::number.



来源:https://stackoverflow.com/questions/1295327/regex-to-parse-international-floating-point-numbers

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!