Using R to process CSV to evaluate if ((ColA != ColB) with consideration for ColC

北城余情 提交于 2019-12-16 18:03:49

问题


I'm trying to achieve a simple string comparison across two columns. Sample of (mocked up) data:

EMPLID,From_DeptCode,FromDept,To_DeptCode,To_Dept,TransactionTypeCode,TransactionType,EffectiveDate,ChangeType
0239583290,21,Sales,43,CustomerService,10,Promotion,12/12/2012
1230495829,21,Sales,21,Sales,10,Promotion,9/1/2013
4059503918,93,Operations,93,Operations,10,Demotion,11/18/2014
3040593021,19,Headquarters,23,International,11,Reorg,12/13/2011
7029406920,15,Marketing,84,Development,19,Reassignment,01/05/2010
2039052819,19,Headquarters,19,Headquarters,10,Promotion,4/15/2015

The logic I want to use is:

If From_DeptCode = To_DeptCode 
      then ChangeType="No Change" 
ElseIf From_DeptCode != To_DeptCode AND TransactionType = "Reorg" 
      then ChangeType="Reorg"
Else ChangeType="Transfer"

So my output would look like:

EMPLID,From_DeptCode,FromDept,To_DeptCode,To_Dept,TransactionTypeCode,TransactionType,EffectiveDate,ChangeType
0239583290,21,Sales,43,CustomerService,10,Promotion,12/12/2012,Transfer
1230495829,21,Sales,21,Sales,10,Promotion,9/1/2013,No Change
4059503918,93,Operations,93,Operations,10,Demotion,11/18/2014,No Change
3040593021,19,Headquarters,23,International,11,Reorg,12/13/2011,Reorg
7029406920,15,Marketing,84,Development,19,Reassignment,01/05/2010,Transfer
2039052819,19,Headquarters,19,Headquarters,10,Promotion,4/15/2015,No Change

Here's what I know so far:

transfers <- read.csv(file="Transfers.csv", head=TRUE,
    sep=",",colClasses=c(NA,NA,NA,NA,NA,NA,NA,"Date",NA))

at this point, I would, I assume, implement my logic:

If From_DeptCode = To_DeptCode 
      then ChangeType="No Change" 
ElseIf From_DeptCode != To_DeptCode AND TransactionType = "Reorg" 
      then ChangeType="Reorg"
Else ChangeType="Transfer"

I assume that here I'd write out my new csv write.csv(transfers, file = "transfersprocessed.csv", row.names = FALSE)

Any advice on getting the rest of the way there?

Update:

Per answer from @josilber, I ran the following code:

transfers <- read.csv(file="Transfers.csv", head=TRUE, sep=",", colClasses=c(NA,NA,NA,NA,NA,NA,NA,"Date",NA))

dat$ChangeType <- ifelse(dat$From_DeptCode == dat$To_DeptCode, "No Change",ifelse(dat$TransactionType == "Reorg", "Reorg", "Transfer"))

View(transfers)

On the following data:

EMPLID,From_DeptCode,FromDept,To_DeptCode,To_Dept,TransactionTypeCode,TransactionType,EffectiveDate,ChangeType
0239583290,21,Sales,43,CustomerService,10,Promotion,12/12/2012
1230495829,21,Sales,21,Sales,10,Promotion,9/1/2013
4059503918,93,Operations,93,Operations,10,Demotion,11/18/2014
3040593021,19,Headquarters,23,International,11,Reorg,12/13/2011
7029406920,15,Marketing,84,Development,19,Reassignment,01/05/2010
2039052819,19,Headquarters,19,Headquarters,10,Promotion,4/15/2015

And the ChangeType variable remained "NA".

Is the nested ifelse statement syntax correct? Any idea why the ChangeType isn't working?


回答1:


You can do this with a nested ifelse statement:

dat$ChangeType <- ifelse(dat$From_DeptCode == dat$To_DeptCode, "No Change",
                         ifelse(dat$TransactionType == "Reorg", "Reorg", "Transfer"))
dat
#       EMPLID From_DeptCode     FromDept To_DeptCode         To_Dept TransactionTypeCode
# 1  239583290            21        Sales          43 CustomerService                  10
# 2 1230495829            21        Sales          21           Sales                  10
# 3 4059503918            93   Operations          93      Operations                  10
# 4 3040593021            19 Headquarters          23   International                  11
# 5 7029406920            15    Marketing          84     Development                  19
# 6 2039052819            19 Headquarters          19    Headquarters                  10
#   TransactionType EffectiveDate ChangeType
# 1       Promotion    12/12/2012   Transfer
# 2       Promotion      9/1/2013  No Change
# 3        Demotion    11/18/2014  No Change
# 4           Reorg    12/13/2011      Reorg
# 5    Reassignment    01/05/2010   Transfer
# 6       Promotion     4/15/2015  No Change

The ifelse is passed a vector of TRUE/FALSE values as its first argument, using the second argument for the TRUE cases and using the third argument for the FALSE cases. For your false cases you actually want to run another ifelse, which is why the logic is nested here.

Note that for large data frames this will be a good deal quicker than looping through your data and doing the nested if statement one row at a time.



来源:https://stackoverflow.com/questions/30243992/using-r-to-process-csv-to-evaluate-if-cola-colb-with-consideration-for-col

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!