Identifying country by IP address

前端 未结 13 1032
盖世英雄少女心
盖世英雄少女心 2020-12-01 05:24

Is there a way to figure out the country name just by looking at an IP address? I mean, do countries have specific ranges of IP addresses? For example, Australia can have IP

13条回答
  •  无人及你
    2020-12-01 05:39

    Here is my solution in Python 3.x to return geo-location info given a dataframe containing IP Address(s); efficient parallelized application of function on vectorized pd.series/dataframe is the way to go.

    Will contrast performance of two popular libraries to return location.

    TLDR: use geolite2 method.

    1. geolite2 package from geolite2 library

    Input

    # !pip install maxminddb-geolite2
    import time
    from geolite2 import geolite2
    geo = geolite2.reader()
    df_1 = train_data.loc[:50,['IP_Address']]
    
    def IP_info_1(ip):
        try:
            x = geo.get(ip)
        except ValueError:   #Faulty IP value
            return np.nan
        try:
            return x['country']['names']['en'] if x is not None else np.nan
        except KeyError:   #Faulty Key value
            return np.nan
    
    s_time = time.time()
    # map IP --> country
    #apply(fn) applies fn. on all pd.series elements
    df_1['country'] = df_1.loc[:,'IP_Address'].apply(IP_info_1)
    print(df_1.head(), '\n')
    print('Time:',str(time.time()-s_time)+'s \n')
    
    print(type(geo.get('48.151.136.76')))
    

    Output

           IP_Address         country
    0   48.151.136.76   United States
    1    94.9.145.169  United Kingdom
    2   58.94.157.121           Japan
    3  193.187.41.186         Austria
    4   125.96.20.172           China 
    
    Time: 0.09906983375549316s 
    
    
    

    2. DbIpCity package from ip2geotools library

    Input

    # !pip install ip2geotools
    import time
    s_time = time.time()
    from ip2geotools.databases.noncommercial import DbIpCity
    df_2 = train_data.loc[:50,['IP_Address']]
    def IP_info_2(ip):
        try:
            return DbIpCity.get(ip, api_key = 'free').country
        except:
            return np.nan
    df_2['country'] = df_2.loc[:, 'IP_Address'].apply(IP_info_2)
    print(df_2.head())
    print('Time:',str(time.time()-s_time)+'s')
    
    print(type(DbIpCity.get('48.151.136.76',api_key = 'free')))
    

    Output

           IP_Address country
    0   48.151.136.76      US
    1    94.9.145.169      GB
    2   58.94.157.121      JP
    3  193.187.41.186      AT
    4   125.96.20.172      CN
    
    Time: 80.53318452835083s 
    
    
    

    A reason why the huge time difference could be due to the Data structure of the output, i.e direct subsetting from dictionaries seems way more efficient than indexing from the specicialized ip2geotools.models.IpLocation object.

    Also, the output of the 1st method is dictionary containing geo-location data, subset respecitively to obtain needed info:

    x = geolite2.reader().get('48.151.136.76')
    print(x)
    
    >>>
        {'city': {'geoname_id': 5101798, 'names': {'de': 'Newark', 'en': 'Newark', 'es': 'Newark', 'fr': 'Newark', 'ja': 'ニューアーク', 'pt-BR': 'Newark', 'ru': 'Ньюарк'}},
    
     'continent': {'code': 'NA', 'geoname_id': 6255149, 'names': {'de': 'Nordamerika', 'en': 'North America', 'es': 'Norteamérica', 'fr': 'Amérique du Nord', 'ja': '北アメリカ', 'pt-BR': 'América do Norte', 'ru': 'Северная Америка', 'zh-CN': '北美洲'}}, 
    
    'country': {'geoname_id': 6252001, 'iso_code': 'US', 'names': {'de': 'USA', 'en': 'United States', 'es': 'Estados Unidos', 'fr': 'États-Unis', 'ja': 'アメリカ合衆国', 'pt-BR': 'Estados Unidos', 'ru': 'США', 'zh-CN': '美国'}}, 
    
    'location': {'accuracy_radius': 1000, 'latitude': 40.7355, 'longitude': -74.1741, 'metro_code': 501, 'time_zone': 'America/New_York'}, 
    
    'postal': {'code': '07102'}, 
    
    'registered_country': {'geoname_id': 6252001, 'iso_code': 'US', 'names': {'de': 'USA', 'en': 'United States', 'es': 'Estados Unidos', 'fr': 'États-Unis', 'ja': 'アメリカ合衆国', 'pt-BR': 'Estados Unidos', 'ru': 'США', 'zh-CN': '美国'}}, 
    
    'subdivisions': [{'geoname_id': 5101760, 'iso_code': 'NJ', 'names': {'en': 'New Jersey', 'es': 'Nueva Jersey', 'fr': 'New Jersey', 'ja': 'ニュージャージー州', 'pt-BR': 'Nova Jérsia', 'ru': 'Нью-Джерси', 'zh-CN': '新泽西州'}}]}
    

提交回复
热议问题