factors

How do I convert certain columns of a data frame to become factors? [duplicate]

China☆狼群 提交于 2019-11-29 20:53:49
Possible Duplicate: identifying or coding unique factors using R I'm having some trouble with R. I have a data set similar to the following, but much longer. A B Pulse 1 2 23 2 2 24 2 2 12 2 3 25 1 1 65 1 3 45 Basically, the first 2 columns are coded. A has 1, 2 which represent 2 different weights. B has 1, 2, 3 which represent 3 different times. As they are coded numerical values, R will treat them as numerical variables. I need to use the factor function to convert these variables into factors. Help? Here's an example: #Create a data frame > d<- data.frame(a=1:3, b=2:4) > d a b 1 1 2 2 2 3 3

In aggregate: sum not meaningful for factors

一个人想着一个人 提交于 2019-11-29 15:48:58
I am trying something that should be simple, any hint on what is going on is very welcomed. I have a large data frame with country imports from some municipalities. For some countries I have 2 entries. I want to sum the imports from each municipality and having a unique row for each country. I am using the aggregate function. For example (I include a small part of the data frame): municipalities<-c("country",1100056, 1100106,1100205,1100304,1200104,1200252) c1<-c("Afghanistan",2,34,23.4,5,0,0) c2<-c("Afghanistan",0,20,11.1,5.4,2,0) c3<-c("Albania",12,120,11.4,5.1,12,10) c4<-c("Albania",0,40,61

Python Pandas: how to turn a DataFrame with “factors” into a design matrix for linear regression?

ε祈祈猫儿з 提交于 2019-11-29 14:03:38
问题 If memory servies me, in R there is a data type called factor which when used within a DataFrame can be automatically unpacked into the necessary columns of a regression design matrix. For example, a factor containing True/False/Maybe values would be transformed into: 1 0 0 0 1 0 or 0 0 1 for the purpose of using lower level regression code. Is there a way to achieve something similar using the pandas library? I see that there is some regression support within Pandas, but since I have my own

Getting Factors of a Number

醉酒当歌 提交于 2019-11-29 09:18:30
问题 I'm trying to refactor this algorithm to make it faster. What would be the first refactoring here for speed? public int GetHowManyFactors(int numberToCheck) { // we know 1 is a factor and the numberToCheck int factorCount = 2; // start from 2 as we know 1 is a factor, and less than as numberToCheck is a factor for (int i = 2; i < numberToCheck; i++) { if (numberToCheck % i == 0) factorCount++; } return factorCount; } 回答1: The first optimization you could make is that you only need to check up

Finding factors of a given integer

匆匆过客 提交于 2019-11-29 07:23:19
I have something like this down: int f = 120; for(int ff = 1; ff <= f; ff++){ while (f % ff != 0){ } Is there anything wrong with my loop to find factors? I'm really confused as to the workings of for and while statements, so chances are they are completely wrong. After this, how would I go about assigning variables to said factors? Sharad Dargan public class Solution { public ArrayList<Integer> allFactors(int a) { int upperlimit = (int)(Math.sqrt(a)); ArrayList<Integer> factors = new ArrayList<Integer>(); for(int i=1;i <= upperlimit; i+= 1){ if(a%i == 0){ factors.add(i); if(i != a/i){ factors

Basic - T-Test -> Grouping Factor Must have Exactly 2 Levels

主宰稳场 提交于 2019-11-29 02:33:43
I am relatively new to R. For my assignment I have to start by conducting a T-Test by looking at the effect of a politician's (Conservative or Labour) wealth on their real gross wealth and real net wealth. I have to attempt to estimate the effect of serving in office wealth using a simple t-test. The dataset is called takehome.dta Labour and Tory are binary where 1 indicates that they serve for that party and 0 otherwise. The variables for wealth are lnrealgross and lnrealnet. I have imported and attached the dataset, but when I attempt to conduct a simple t-test. I get the following message

How do I convert certain columns of a data frame to become factors? [duplicate]

北慕城南 提交于 2019-11-28 17:05:14
问题 This question already has an answer here : Closed 7 years ago . Possible Duplicate: identifying or coding unique factors using R I'm having some trouble with R. I have a data set similar to the following, but much longer. A B Pulse 1 2 23 2 2 24 2 2 12 2 3 25 1 1 65 1 3 45 Basically, the first 2 columns are coded. A has 1, 2 which represent 2 different weights. B has 1, 2, 3 which represent 3 different times. As they are coded numerical values, R will treat them as numerical variables. I need

Identifying or coding unique factors

戏子无情 提交于 2019-11-28 14:33:46
I would like to create a new variable,litter, to indicate each sow or litter in different farrowing dates (fdate). Each litter is to be numbered from 1 to N with an increament of 1 as shown in the last column. sow season piglet fdate litter 1M521 1 5702 14/09/2009 1 1M521 1 5703 14/09/2009 1 1M521 2 22920 17/02/2010 2 1M521 2 22920 17/02/2010 2 1M521 2 22920 17/02/2010 2 1M584 1 8516 28/09/2009 3 1M584 1 8516 28/09/2009 3 1M584 1 8516 28/09/2009 3 1N312 1 6192 16/09/2009 4 1N312 1 6193 16/09/2009 4 1N312 1 6194 16/09/2009 4 1N312 2 21818 11/02/2010 5 1N312 2 21819 11/02/2010 5 1N312 2 21820 11

How can I ensure that a partition has representative observations from each level of a factor?

时光总嘲笑我的痴心妄想 提交于 2019-11-28 10:28:39
I wrote a small function to partition my dataset into training and testing sets. However, I am running into trouble when dealing with factor variables. In the model validation phase of my code, I get an error if the model was built on a dataset that doesn't have representation from each level of a factor. How can I fix this partition() function to include at least one observation from every level of a factor variable? test.df <- data.frame(a = sample(c(0,1),100, rep = T), b = factor(sample(letters, 100, rep = T)), c = factor(sample(c("apple", "orange"), 100, rep = T))) set.seed(123) partition

In aggregate: sum not meaningful for factors

主宰稳场 提交于 2019-11-28 09:15:47
问题 I am trying something that should be simple, any hint on what is going on is very welcomed. I have a large data frame with country imports from some municipalities. For some countries I have 2 entries. I want to sum the imports from each municipality and having a unique row for each country. I am using the aggregate function. For example (I include a small part of the data frame): municipalities<-c("country",1100056, 1100106,1100205,1100304,1200104,1200252) c1<-c("Afghanistan",2,34,23.4,5,0,0