Parse currency into numbers in Python

前端 未结 2 1986
鱼传尺愫
鱼传尺愫 2020-12-06 12:03

I just learnt from Format numbers as currency in Python that the Python module babel provides babel.numbers.format_currency to format numbers as currency. For instance,

相关标签:
2条回答
  • 2020-12-06 12:53

    Using babel

    The babel documentation notes that the number parsing is not fully implemented yes but they have done a lot of work to get currency info into the library. You can use get_currency_name() and get_currency_symbol() to get currency details, and also all other get_... functions to get the normal number details (decimal point, minus sign, etc.).

    Using that information you can exclude from a currency string the currency details (name, sign) and groupings (e.g. , in the US). Then you change the decimal details into the ones used by the C locale (- for minus, and . for the decimal point).

    This results in this code (i added an object to keep some of the data, which may come handy in further processing):

    import re, os
    from babel import numbers as n
    from babel.core import default_locale
    
    class AmountInfo(object):
        def __init__(self, name, symbol, value):
            self.name = name
            self.symbol = symbol
            self.value = value
    
    def parse_currency(value, cur):
        decp = n.get_decimal_symbol()
        plus = n.get_plus_sign_symbol()
        minus = n.get_minus_sign_symbol()
        group = n.get_group_symbol()
        name = n.get_currency_name(cur)
        symbol = n.get_currency_symbol(cur)
        remove = [plus, name, symbol, group]
        for token in remove:
            # remove the pieces of information that shall be obvious
            value = re.sub(re.escape(token), '', value)
        # change the minus sign to a LOCALE=C minus
        value = re.sub(re.escape(minus), '-', value)
        # and change the decimal mark to a LOCALE=C decimal point
        value = re.sub(re.escape(decp), '.', value)
        # just in case remove extraneous spaces
        value = re.sub('\s+', '', value)
        return AmountInfo(name, symbol, value)
    
    #cur_loc = os.environ['LC_ALL']
    cur_loc = default_locale()
    print('locale:', cur_loc)
    test = [ (n.format_currency(123456.789, 'USD', locale=cur_loc), 'USD')
           , (n.format_currency(-123456.78, 'PLN', locale=cur_loc), 'PLN')
           , (n.format_currency(123456.789, 'PLN', locale=cur_loc), 'PLN')
           , (n.format_currency(123456.789, 'IDR', locale=cur_loc), 'IDR')
           , (n.format_currency(123456.789, 'JPY', locale=cur_loc), 'JPY')
           , (n.format_currency(-123456.78, 'JPY', locale=cur_loc), 'JPY')
           , (n.format_currency(123456.789, 'CNY', locale=cur_loc), 'CNY')
           , (n.format_currency(-123456.78, 'CNY', locale=cur_loc), 'CNY')
           ]
    
    for v,c in test:
        print('As currency :', c, ':', v.encode('utf-8'))
        info = parse_currency(v, c)
        print('As value    :', c, ':', info.value)
        print('Extra info  :', info.name.encode('utf-8')
                             , info.symbol.encode('utf-8'))
    

    The output looks promising (in US locale):

    $ export LC_ALL=en_US
    $ ./cur.py
    locale: en_US
    As currency : USD : b'$123,456.79'
    As value    : USD : 123456.79
    Extra info  : b'US Dollar' b'$'
    As currency : PLN : b'-z\xc5\x82123,456.78'
    As value    : PLN : -123456.78
    Extra info  : b'Polish Zloty' b'z\xc5\x82'
    As currency : PLN : b'z\xc5\x82123,456.79'
    As value    : PLN : 123456.79
    Extra info  : b'Polish Zloty' b'z\xc5\x82'
    As currency : IDR : b'Rp123,457'
    As value    : IDR : 123457
    Extra info  : b'Indonesian Rupiah' b'Rp'
    As currency : JPY : b'\xc2\xa5123,457'
    As value    : JPY : 123457
    Extra info  : b'Japanese Yen' b'\xc2\xa5'
    As currency : JPY : b'-\xc2\xa5123,457'
    As value    : JPY : -123457
    Extra info  : b'Japanese Yen' b'\xc2\xa5'
    As currency : CNY : b'CN\xc2\xa5123,456.79'
    As value    : CNY : 123456.79
    Extra info  : b'Chinese Yuan' b'CN\xc2\xa5'
    As currency : CNY : b'-CN\xc2\xa5123,456.78'
    As value    : CNY : -123456.78
    Extra info  : b'Chinese Yuan' b'CN\xc2\xa5'
    

    And it still works in different locales (Brazil is notable for using the comma as a decimal mark):

    $ export LC_ALL=pt_BR
    $ ./cur.py 
    locale: pt_BR
    As currency : USD : b'US$123.456,79'
    As value    : USD : 123456.79
    Extra info  : b'D\xc3\xb3lar americano' b'US$'
    As currency : PLN : b'-PLN123.456,78'
    As value    : PLN : -123456.78
    Extra info  : b'Zloti polon\xc3\xaas' b'PLN'
    As currency : PLN : b'PLN123.456,79'
    As value    : PLN : 123456.79
    Extra info  : b'Zloti polon\xc3\xaas' b'PLN'
    As currency : IDR : b'IDR123.457'
    As value    : IDR : 123457
    Extra info  : b'Rupia indon\xc3\xa9sia' b'IDR'
    As currency : JPY : b'JP\xc2\xa5123.457'
    As value    : JPY : 123457
    Extra info  : b'Iene japon\xc3\xaas' b'JP\xc2\xa5'
    As currency : JPY : b'-JP\xc2\xa5123.457'
    As value    : JPY : -123457
    Extra info  : b'Iene japon\xc3\xaas' b'JP\xc2\xa5'
    As currency : CNY : b'CN\xc2\xa5123.456,79'
    As value    : CNY : 123456.79
    Extra info  : b'Yuan chin\xc3\xaas' b'CN\xc2\xa5'
    As currency : CNY : b'-CN\xc2\xa5123.456,78'
    As value    : CNY : -123456.78
    Extra info  : b'Yuan chin\xc3\xaas' b'CN\xc2\xa5'
    

    It is worth to point out that babel has some encoding problems. That is because the locale files (in locale-data) do use different encoding themselves. If you're working with currencies you're familiar with that should not be a problem. But if you try unfamiliar currencies you might run into problems (i just learned that Poland uses iso-8859-2, not iso-8859-1).

    0 讨论(0)
  • 2020-12-06 12:53

    Below is a general currency parser that doesn't rely on the babel library.

    import numpy as np
    import re
    
    def currency_parser(cur_str):
        # Remove any non-numerical characters
        # except for ',' '.' or '-' (e.g. EUR)
        cur_str = re.sub("[^-0-9.,]", '', cur_str)
        # Remove any 000s separators (either , or .)
        cur_str = re.sub("[.,]", '', cur_str[:-3]) + cur_str[-3:]
    
        if '.' in list(cur_str[-3:]):
            num = float(cur_str)
        elif ',' in list(cur_str[-3:]):
            num = float(cur_str.replace(',', '.'))
        else:
            num = float(cur_str)
    
        return np.round(num, 2)
    
    

    Here is a pytest script that tests the function:

    import numpy as np
    import pytest
    import re
    
    
    def currency_parser(cur_str):
        # Remove any non-numerical characters
        # except for ',' '.' or '-' (e.g. EUR)
        cur_str = re.sub("[^-0-9.,]", '', cur_str)
        # Remove any 000s separators (either , or .)
        cur_str = re.sub("[.,]", '', cur_str[:-3]) + cur_str[-3:]
    
        if '.' in list(cur_str[-3:]):
            num = float(cur_str)
        elif ',' in list(cur_str[-3:]):
            num = float(cur_str.replace(',', '.'))
        else:
            num = float(cur_str)
    
        return np.round(num, 2)
    
    
    @pytest.mark.parametrize('currency_str, expected', [
        (
                '.3', 0.30
        ),
        (
                '1', 1.00
        ),
        (
                '1.3', 1.30
        ),
        (
                '43,324', 43324.00
        ),
        (
                '3,424', 3424.00
        ),
        (
                '-0.00', 0.00
        ),
        (
                'EUR433,432.53', 433432.53
        ),
        (
                '25.675,26 EUR', 25675.26
        ),
        (
                '2.447,93 EUR', 2447.93
        ),
        (
                '-540,89EUR', -540.89
        ),
        (
                '67.6 EUR', 67.60
        ),
        (
                '30.998,63 CHF', 30998.63
        ),
        (
                '0,00 CHF', 0.00
        ),
        (
                '159.750,00 DKK', 159750.00
        ),
        (
                '£ 2.237,85', 2237.85
        ),
        (
                '£ 2,237.85', 2237.85
        ),
        (
                '-1.876,85 SEK', -1876.85
        ),
        (
                '59294325.3', 59294325.30
        ),
        (
                '8,53 NOK', 8.53
        ),
        (
                '0,09 NOK', 0.09
        ),
        (
                '-.9 CZK', -0.9
        ),
        (
                '35.255,40 PLN', 35255.40
        ),
        (
                '-PLN123.456,78', -123456.78
        ),
        (
                'US$123.456,79', 123456.79
        ),
        (
                '-PLN123.456,78', -123456.78
        ),
        (
                'PLN123.456,79', 123456.79
        ),
        (
                'IDR123.457', 123457
        ),
        (
                'JP¥123.457', 123457
        ),
        (
                '-JP\xc2\xa5123.457', -123457
        ),
        (
                'CN\xc2\xa5123.456,79', 123456.79
        ),
        (
                '-CN\xc2\xa5123.456,78', -123456.78
        ),
    ])
    def test_currency_parse(currency_str, expected):
        assert currency_parser(currency_str) == expected
    
    0 讨论(0)
提交回复
热议问题