Looping through a list of pandas dataframes

前端 未结 2 1122
栀梦
栀梦 2020-12-28 23:21

Two quick pandas questions for you.

  1. I have a list of dataframes I would like to apply a filter to.

    countries = [us, uk, france]
    for df in co         
    
    
            
相关标签:
2条回答
  • 2020-12-28 23:58

    For why

    for df in countries:
        df["Continent"] = "Europe"
    

    modifies countries, while

    for df in countries:
        df = df[(df["Send Date"] > '2016-11-01') & (df["Send Date"] < '2016-11-30')] 
    

    does not, see why should I make a copy of a data frame in pandas. df is a reference to the actual DataFrame in countries, and not the actual DataFrame itself, but modifications to a reference affect the original DataFrame as well. Declaring a new column is a modification. However, taking a subset is not a modification. It is just changing what the reference is referring to in the original DataFrame.

    0 讨论(0)
  • 2020-12-29 00:01

    Taking a look at this answer, you can see that for df in countries: is equivalent to something like

    for idx in range(len(countries)):
        df = countries[idx]
        # do something with df
    

    which obviously won't actually modify anything in your list. It is generally bad practice to modify a list while iterating over it in a loop like this.

    A better approach would be a list comprehension, you can try something like

     countries = [us, uk, france]
     countries = [df[(df["Send Date"] > '2016-11-01') & (df["Send Date"] < '2016-11-30')]
                  for df in countries] 
    

    Notice that with a list comprehension like this, we aren't actually modifying the original list - instead we are creating a new list, and assigning it to the variable which held our original list.

    Also, you might consider placing all of your data in a single DataFrame with an additional country column or something along those lines - Python-level loops are generally slower and a list of DataFrames is often much less convenient to work with than a single DataFrame, which can fully leverage the vectorized pandas methods.

    0 讨论(0)
提交回复
热议问题