data-analysis | 易学教程

how to append two or more dataframes in pandas and do some analysis

阅读更多关于 how to append two or more dataframes in pandas and do some analysis

问题 I have 3 df's: df1=pd.DataFrame({"Name":["one","two","three"],"value":[4,5,6]}) df2=pd.DataFrame({"Name":["four","one","three"],"value":[8,6,2]}) df3=pd.DataFrame({"Name":["one","four","six"],"value":[1,1,1]}) I can append one by one but I want to append all the three data frames at a time and do some analysis. I am trying to count the name contains in how many data frame divided by total dataframes name present in dataframes/total dataframes My desired output is, Name value Count one 11 1

Pandas error with basemap/proj for map plotting

阅读更多关于 Pandas error with basemap/proj for map plotting

问题 I ran the Python code below that is an example of "Plotting Maps: Visualizing Haiti Earthquake Crisis Data" on a book, Python for Data Analysis . Page 242-246 The code is supposed to create a plot map of Haiti but I got an error as below: Traceback (most recent call last): File "Haiti.py", line 74, in <module> x, y = m(cat_data.LONGITUDE, cat_data.LATITUDE) File "/usr/local/lib/python2.7/site-packages/mpl_toolkits/basemap/__init__.py", line 1148, in __call__ xout,yout = self.projtran(x,y

join two tables without losing relevant values

阅读更多关于 join two tables without losing relevant values

问题 I have two tables representing a database for customer products and its competitors' products: tmp_match - from_product_id and to_product_id representing matches between customer product and competitor product respectively. tmp_price_history - shows the price of each product per date. I am trying to write a query which will list all dates from table tmp_price_history . For each date I want to see customer product price vs competitor product price according to product matches pairs in table

Missing Value in Data Analysis

阅读更多关于 Missing Value in Data Analysis

问题 I have a data set in which the variable GENDER containing two levels Male(M) and Female(F) has lot of Missing values . How do i deal with missing value? What are the different methods to handle these missing values. Any help would be appreciated. 回答1: There are several techniques in order to estimate a missing value. I've been writing a paper for a project at Uni regarding such methods. I will briefly explain 5 commonly used missing data imputation techniques. Hereinafter we will consider a

Missing Value in Data Analysis

阅读更多关于 Missing Value in Data Analysis

Matplotlib: Formatting dates on the x-axis in a 3D Bar graph

阅读更多关于 Matplotlib: Formatting dates on the x-axis in a 3D Bar graph

问题 Given this 3D bar graph sample code, how would you convert the numerical data in the x-axis to formatted date/time strings? I've attempted using the ax.xaxis_date() function without success. I also tried using plot_date(), which doesn't appear to work for 3D bar graphs. Here is a modified version of the sample code to illustrate what I am trying to do: from mpl_toolkits.mplot3d import Axes3D import matplotlib.pyplot as plt import numpy as np import matplotlib.dates as dates dates = [dates

R: Cross validation on a dataset with factors

阅读更多关于 R: Cross validation on a dataset with factors

问题 Often, I want to run a cross validation on a dataset which contains some factor variables and after running for a while, the cross validation routine fails with the error: factor x has new levels Y . For example, using package boot: library(boot) d <- data.frame(x=c('A', 'A', 'B', 'B', 'C', 'C'), y=c(1, 2, 3, 4, 5, 6)) m <- glm(y ~ x, data=d) m.cv <- cv.glm(d, m, K=2) # Sometimes succeeds m.cv <- cv.glm(d, m, K=2) # Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =

how to match a word in a datacolumn with a list of values and applying ignorecase in pandas in python

阅读更多关于 how to match a word in a datacolumn with a list of values and applying ignorecase in pandas in python

问题 I have a df, Name Ram is one of the key ram Kumar is playing cricket Ravi is playing and ravi is a good player and a list my_list=["Ram","ravi"] and my desired dataframe is, desired_df, Name Match Count Ram is one of the key ram Ram 1 Kumar is playing cricket Ravi is playing and ravi is a good player ravi 1 I tried extracted = df.str.findall('(' + '|'.join(my_list) + ')', flags=re.IGNORECASE).apply(set) but I am getting like, Match Ram,ram Ravi,ravi but I cannot achieve my desired output,

How to merge two large numpy arrays if slicing doesn't resolve memory error?

阅读更多关于 How to merge two large numpy arrays if slicing doesn't resolve memory error?

问题 I have two numpy arrays container1 and container2 where container1.shape = (900,4000) and container2.shape = (5000,4000) . Merging them using vstack results in a MemoryError . After searching through the old questions posted here, I tried to merge them using slicing like this: mergedContainer = numpy.vstack((container1, container2[:1000])) mergedContainer = numpy.vstack((mergedContainer, container[1000:2500])) mergedContainer = numpy.vstack((mergedContainer, container[2500:3000])) but after

Plot pandas dataframe containing NaNs

阅读更多关于 Plot pandas dataframe containing NaNs

问题 I have GPS data of ice speed from three different GPS receivers. The data are in a pandas dataframe with an index of julian day (incremental from the start of 2009). This is a subset of the data (the main dataset is 3487235 rows...): R2 R7 R8 1235.000000 116.321959 100.805197 96.519977 1235.000116 NaN 100.771133 96.234957 1235.000231 NaN 100.584559 97.249262 1235.000347 118.823610 100.169055 96.777833 1235.000463 NaN 99.753551 96.598350 1235.000579 NaN 99.338048 95.283989 1235.000694 113