I have string:
$string = \'Five People\';
I want to replace all number-words into numbers. So results are:
$strin
I have tried to port a text2num Python library to PHP, mix it with a regex for matching English spelled out numbers, enhanced it to the decillion, and here is a result:
function text2num($s) {
// Enhanced the regex at http://www.rexegg.com/regex-trick-numbers-in-english.html#english-number-regex
$reg = <<<REGEX
(?x) # free-spacing mode
(?(DEFINE)
# Within this DEFINE block, we'll define many subroutines
# They build on each other like lego until we can define
# a "big number"
(?<one_to_9>
# The basic regex:
# one|two|three|four|five|six|seven|eight|nine
# We'll use an optimized version:
# Option 1: four|eight|(?:fiv|(?:ni|o)n)e|t(?:wo|hree)|
# s(?:ix|even)
# Option 2:
(?:f(?:ive|our)|s(?:even|ix)|t(?:hree|wo)|(?:ni|o)ne|eight)
) # end one_to_9 definition
(?<ten_to_19>
# The basic regex:
# ten|eleven|twelve|thirteen|fourteen|fifteen|sixteen|seventeen|
# eighteen|nineteen
# We'll use an optimized version:
# Option 1: twelve|(?:(?:elev|t)e|(?:fif|eigh|nine|(?:thi|fou)r|
# s(?:ix|even))tee)n
# Option 2:
(?:(?:(?:s(?:even|ix)|f(?:our|if)|nine)te|e(?:ighte|lev))en|
t(?:(?:hirte)?en|welve))
) # end ten_to_19 definition
(?<two_digit_prefix>
# The basic regex:
# twenty|thirty|forty|fifty|sixty|seventy|eighty|ninety
# We'll use an optimized version:
# Option 1: (?:fif|six|eigh|nine|(?:tw|sev)en|(?:thi|fo)r)ty
# Option 2:
(?:s(?:even|ix)|t(?:hir|wen)|f(?:if|or)|eigh|nine)ty
) # end two_digit_prefix definition
(?<one_to_99>
(?&two_digit_prefix)(?:[- ](?&one_to_9))?|(?&ten_to_19)|
(?&one_to_9)
) # end one_to_99 definition
(?<one_to_999>
(?&one_to_9)[ ]hundred(?:[ ](?:and[ ])?(?&one_to_99))?|
(?&one_to_99)
) # end one_to_999 definition
(?<one_to_999_999>
(?&one_to_999)[ ]thousand(?:[ ](?&one_to_999))?|
(?&one_to_999)
) # end one_to_999_999 definition
(?<one_to_999_999_999>
(?&one_to_999)[ ]million(?:[ ](?&one_to_999_999))?|
(?&one_to_999_999)
) # end one_to_999_999_999 definition
(?<one_to_999_999_999_999>
(?&one_to_999)[ ]billion(?:[ ](?&one_to_999_999_999))?|
(?&one_to_999_999_999)
) # end one_to_999_999_999_999 definition
(?<one_to_999_999_999_999_999>
(?&one_to_999)[ ]trillion(?:[ ](?&one_to_999_999_999_999))?|
(?&one_to_999_999_999_999)
) # end one_to_999_999_999_999_999 definition
# ==== MORE ====
(?<one_to_quadrillion>
(?&one_to_999)[ ]quadrillion(?:[ ](?&one_to_999_999_999_999_999))?|
(?&one_to_999_999_999_999_999)
) # end one_to_quadrillion definition
(?<one_to_quintillion>
(?&one_to_999)[ ]quintillion(?:[ ](?&one_to_quadrillion))?|
(?&one_to_quadrillion)
) # end one_to_quintillion definition
(?<one_to_sextillion>
(?&one_to_999)[ ]sextillion(?:[ ](?&one_to_quintillion))?|
(?&one_to_quintillion)
) # end one_to_sextillion definition
(?<one_to_septillion>
(?&one_to_999)[ ]septillion(?:[ ](?&one_to_sextillion))?|
(?&one_to_sextillion)
) # end one_to_septillion definition
(?<one_to_octillion>
(?&one_to_999)[ ]octillion(?:[ ](?&one_to_septillion))?|
(?&one_to_septillion)
) # end one_to_octillion definition
(?<one_to_nonillion>
(?&one_to_999)[ ]nonillion(?:[ ](?&one_to_octillion))?|
(?&one_to_octillion)
) # end one_to_nonillion definition
(?<one_to_decillion>
(?&one_to_999)[ ]decillion(?:[ ](?&one_to_nonillion))?|
(?&one_to_nonillion)
) # end one_to_decillion definition
(?<bignumber>
zero|(?&one_to_decillion)
) # end bignumber definition
(?<zero_to_9>
(?&one_to_9)|zero
) # end zero to 9 definition
# (?<decimals>
# point(?:[ ](?&zero_to_9))+
# ) # end decimals definition
) # End DEFINE
####### The Regex Matching Starts Here ########
\b(?:(?&ten_to_19)\s+hundred|(?&bignumber))\b
REGEX;
return preg_replace_callback('~' . trim($reg) . '~i', function ($x) {
return text2num_internal($x[0]);
}, $s);
}
function text2num_internal($s) {
// Port of https://github.com/ghewgill/text2num/blob/master/text2num.py
$Small = [
'zero'=> 0,
'one'=> 1,
'two'=> 2,
'three'=> 3,
'four'=> 4,
'five'=> 5,
'six'=> 6,
'seven'=> 7,
'eight'=> 8,
'nine'=> 9,
'ten'=> 10,
'eleven'=> 11,
'twelve'=> 12,
'thirteen'=> 13,
'fourteen'=> 14,
'fifteen'=> 15,
'sixteen'=> 16,
'seventeen'=> 17,
'eighteen'=> 18,
'nineteen'=> 19,
'twenty'=> 20,
'thirty'=> 30,
'forty'=> 40,
'fifty'=> 50,
'sixty'=> 60,
'seventy'=> 70,
'eighty'=> 80,
'ninety'=> 90
];
$Magnitude = [
'thousand'=> 1000,
'million'=> 1000000,
'billion'=> 1000000000,
'trillion'=> 1000000000000,
'quadrillion'=> 1000000000000000,
'quintillion'=> 1000000000000000000,
'sextillion'=> 1000000000000000000000,
'septillion'=> 1000000000000000000000000,
'octillion'=> 1000000000000000000000000000,
'nonillion'=> 1000000000000000000000000000000,
'decillion'=> 1000000000000000000000000000000000,
];
$a = preg_split("~[\s-]+(?:and[\s-]+)?~u", $s);
$a = array_map('strtolower', $a);
$n = 0;
$g = 0;
foreach ($a as $w) {
if (isset($Small[$w])) {
$g = $g + $Small[$w];
}
else if ($w == "hundred" && $g != 0) {
$g = $g * 100;
}
else {
$x = $Magnitude[$w];
if (strlen($x) > 0) {
$n =$n + $g * $x;
$g = 0;
}
else{
throw new Exception("Unknown number: " . $w);
}
}
}
return $n + $g;
}
echo text2num("one") . "\n"; // 1
echo text2num("twelve") . "\n"; // 12
echo text2num("seventy two") . "\n"; // 72
echo text2num("three hundred") . "\n"; // 300
echo text2num("twelve hundred") . "\n"; // 1200
echo text2num("twelve thousand three hundred four") . "\n"; // 12304
echo text2num("six million") . "\n"; // 6000000
echo text2num("six million four hundred thousand five") . "\n"; // 6400005
echo text2num("one hundred twenty three billion four hundred fifty six million seven hundred eighty nine thousand twelve") . "\n"; # // 123456789012
echo text2num("four decillion") . "\n"; // 4000000000000000000000000000000000
echo text2num("five hundred and thirty-seven") . "\n"; // 537
echo text2num("five hundred and thirty seven") . "\n"; // 537
See the PHP demo.
The regex can actually match either just big numbers or numbers like "eleven hundred", see \b(?:(?&ten_to_19)\s+hundred|(?&bignumber))\b
. It can be further enhanced. E.g. word boundaries may be replaced with other boundary types (like (?<!\S)
and (?!\S)
to match in between whitespaces, etc.).
Decimal part in the regex is commented out since even if we match it, the num2text
won't handle them.
You can use this regex:
\b(zero|a|one|tw(elve|enty|o)|th(irt(een|y)|ree)|fi(ft(een|y)|ve)|(four|six|seven|nine)(teen|ty)?|eight(een|y)?|ten|eleven|forty|hundred|thousand|(m|b)illion|and)+\b
By the way, there might be a better regex out there. Until someone posts it, you can use the following implementation
$regex = '/\b(zero|a|one|tw(elve|enty|o)|th(irt(een|y)|ree)|fi(ft(een|y)|ve)|(four|six|seven|nine)(teen|ty)?|eight(een|y)?|ten|eleven|forty|hundred|thousand|(m|b)illion|and)+\b/i';
function word_numbers_to_numbers($string) {
return preg_replace_callback($regex, function($m) {
return words_to_number($m[0]);
},$string);
}