data-science

How can I create a new dataframe comparing values and getting only most recent data in R?

一个人想着一个人 提交于 2019-12-11 06:49:56
问题 I have a data frame that has the data from the Gini Index of countries. Plenty of the values are NA , so i want to create a new data frame that has, for each country, the most recent Gini Index measured for it. For example, if Brazil has a value for 2012, 2013 and 2015, the new data frame will have only the value of 2015. This is how the data looks like: Country.Name Country.Code X2014 X2015 X2016 X2017 8 Argentina ARG 41.4 NA 42.4 NA 9 Armenia ARM 31.5 32.4 32.5 NA 13 Austria AUT 30.5 30.5

How to create Histograms in Panda Python Using Specific Rows and Columns in Data Frame

百般思念 提交于 2019-12-11 05:29:20
问题 I have the following data frame in the picture, i want to take a Plot a histogram to show the distribution of all countries in the world for any given year (e.g. 2010). Following is my code table generates after the following code of cleaning: dataSheet = pd.read_excel("http://api.worldbank.org/v2/en/indicator/EN.ATM.CO2E.PC?downloadformat=excel",sheetname="Data") dataSheet = dataSheet.transpose() dataSheet = dataSheet.drop(dataSheet.columns[[0,1]], axis=1) ; dataSheet = dataSheet.drop([

Unknown label type: continuous

旧街凉风 提交于 2019-12-11 05:29:08
问题 Avg.SessionLength TimeonApp TimeonWebsite LengthofMembership Yearly Amount Spent 0 34.497268 12.655651 39.577668 4.082621 587.951054 1 31.926272 11.109461 37.268959 2.664034 392.204933 2 33.000915 11.330278 37.110597 4.104543 487.547505 3 34.305557 13.717514 36.721283 3.120179 581.852344 4 33.330673 12.795189 37.536653 4.446308 599.406092 5 33.871038 12.026925 34.476878 5.493507 637.102448 6 32.021596 11.366348 36.683776 4.685017 521.572175 I want to apply KNN: X = df[['Avg. Session Length',

How to impute values in a column when certain conditions are fulfilled in other columns using fillna()

强颜欢笑 提交于 2019-12-11 05:09:45
问题 I've calculated the counts when credit_history has NaN values. Output when Credit_History is NaN: Self_Employed Yes 532 No 32 Married No 398 Yes 21 And for the numerical values, I calculated the mean for all columns output for non-numerical values when Credit_History is NaN: Mean Applicant Income: 54003.1232 LoanAmount: 35435.12 Loan_Amount_Term: 360 ApplicantIncome: 30000 How do I now use fillna() in these cases: Case 1: When Self_Employed = Y and Married = N; Credit_History should be 0 Case

“Setting an array element with a sequence” numpy error

社会主义新天地 提交于 2019-12-11 04:14:51
问题 I'm working on a project that involves having to work with preprocessed data in the following form. Data explanation has been given above too. The goal is to predict whether a written digit matches the audio of said digit or not. First I transform the spoken arrays of form (N,13) to the means over the time axis as such: This creates a consistent length of (1,13) for every array within spoken. In order to test this in a simple vanilla algorithm I zip the two arrays together such that we create

How to precisely sample data with frequency of 60Hz?

£可爱£侵袭症+ 提交于 2019-12-11 04:13:35
问题 Actually, I use the InvokeRepeating method to invoke another method every 1/x seconds. The problem is that the precision of the delay between the invoke and the data I got is not good. How I can precisely sample transform.position with a frequency of 60Hz. Here's my code: public class Recorder : MonoBehaviour { public float samplingRate = 60f; // sample rate in Hz public string outputFilePath; private StreamWriter _sw; private List<Data> dataList = new List<Data>(); public void OnEnable() {

How to format date to 1900's?

落爺英雄遲暮 提交于 2019-12-11 01:43:34
问题 I'm preprocessing data and one column represents dates such as '6/1/51' I'm trying to convert the string to a date object and so far what I have is: date = row[2].strip() format = "%m/%d/%y" datetime_object = datetime.strptime(date, format) date_object = datetime_object.date() print(date_object) print(type(date_object)) The problem I'm facing is changing 2051 to 1951. I tried writing format = "%m/%d/19%y" But it gives me a ValueError. ValueError: time data '6/1/51' does not match format '%m/

How to get Adjusted R Square for Linear Regression

て烟熏妆下的殇ゞ 提交于 2019-12-10 23:36:23
问题 Using sklearn.metrics I can compute R square.How I can compute Adjusted Adjusted R square using Linear Regression model? 回答1: Scikit-Learn's Linear Regression does not return the adjusted R squared. However, from the R -squared you can calculate the adjusted R squared from the formula: Where p is the number of predictors (also known as features or explanatory variables) and n is the number of data points. So if your data is in a dataframe called train and you have the r2 , the formula would

Removing multiple recurring text from pandas rows`

送分小仙女□ 提交于 2019-12-10 23:36:16
问题 I am having a pandas dataframe which consists of scraped articles from websites as rows. I have 100 thousand articles in the similar nature. Here is a glimse of my dataset. text 0 which brings not only warmer weather but also the unsettling realization that the year is more than halfway over. So 1 which brings not only warmer weather but also the unsettling realization that the year is more than halfway over. So 2 which brings not only warmer weather but also the unsettling realization that

Python Pandas Series if else box plot

匆匆过客 提交于 2019-12-10 22:37:28
问题 I have alot of data in a dictionary format and I am attempting to use pandas print a string based on an IF ELSE statement. For my example ill make up some data in dict and covert to Pandas: df = pd.DataFrame(dict(a=[1.5,2.8,9.3],b=[7.2,3.3,4.9],c=[13.1,4.9,15.9],d=[1.1,1.9,2.9])) df This returns: a b c d 0 1.5 7.2 13.1 1.1 1 2.8 3.3 4.9 1.9 2 9.3 4.9 15.9 2.9 My IF ELSE statement: for col in df.columns: if (df[col] < 4).any(): print('Zone %s does not make setpoint' % col) else: print('Zone %s