How to know customers who placed next order before delivery/receiving of earlier order? In R

北慕城南 提交于 2020-12-15 06:26:09

问题


I have a large database having two dates. E.g. Take superstore data (http://www.tableau.com/sites/default/files/training/global_superstore.zip) 'Orders' Sheet.

One date is let's say date of Order and another is date of shipping/delivery (Assume it is delivery date). I want to know details of all orders of those customers who placed their next order without waiting for shipping/delivery of any one of their previous orders.

For e.g. Customer with ID 'ZC-21910' placed order with ID CA-2014-133928 on 12 June 2014 which was shipped on 18 June 2014. The same customer, however, placed next order with ID 'IT-2014-3511710' on 13 June 2014 i.e. before 18 June 2014 (shipping date of one of the earlier orders).

It will be best all such orders (order IDs) are filtered out in a separate vector.

How can I do it in R? or alternatively in Tableau?

example dataset

> dput(df)
structure(list(customer_id = c("A", "A", "A", "B", "B", "C", 
"C"), order_id = structure(1:7, .Label = c("1", "2", "3", "4", 
"5", "6", "7"), class = "factor"), order_date = structure(c(17897, 
17901, 17912, 17901, 17902, 17903, 17905), class = "Date"), ship_date = structure(c(17926, 
17906, 17914, 17904, 17904, 17904, 17906), class = "Date")), row.names = c(NA, 
-7L), class = c("tbl_df", "tbl", "data.frame"))

回答1:


Edit: My earlier answer did not properly handle the case where Order Date == Ship Date.

I assume that you already loaded your data in an object called df. You can use the first part of @hello_friend's code to get this.

library(tidyverse)
df %>% 
  distinct(`Customer ID`, `Order ID`, `Order Date`, `Ship Date`) %>% 
  arrange(`Customer ID`, `Order Date`, `Ship Date`) %>% 
  mutate(sort_key = row_number()) %>% 
  pivot_longer(c(`Order Date`, `Ship Date`), names_to = "Activity", names_pattern = "(.*) Date", values_to = "Date") %>% 
  mutate(Activity = factor(Activity, ordered = TRUE, levels = c("Order", "Ship")), 
         Open = if_else(Activity == "Order", 1, -1)) %>% 
  group_by(`Customer ID`) %>% 
  arrange(Date, sort_key, Activity, .by_group = TRUE) %>% 
  mutate(Open = cumsum(Open)) %>% 
  ungroup %>% 
  filter(Open > 1, Activity == "Order") %>% 
  select(`Customer ID`, `Order ID`)

First, take only distinct order and customer IDs, otherwise the multiple items from the same order will confuse things and cause an incorrect result. Then, pivot the data so that each order become two rows, each representing a distinct activity: either ordering or shipping. We create a running total of the number of open orders. You're looking for when this becomes two or more.

I use an ordered factor for Activity to make sure that I always open an order before closing it. This matters when the order date and ship date are the same.

I use a special sort_key column to make sure that I close out the old order before opening a new one, in the cases when the customer orders on the same day that something else was shipped. You may want the reverse logic.

All of this assumes that a given Customer ID and Order ID only appear once in the data, which actually isn't true in your dataset, as you can see with:

df %>% group_by(`Customer ID`, `Order ID`) %>% filter(n_distinct(`Ship Date`)> 1) %>% select(1:9)



回答2:


Here's how I would structure this workflow in R, note: replicating the functionality in Tableau will be very difficult.

# Install pacakges if they are not already installed: necessary_packages => vector
necessary_packages <- c("readxl")

# Create a vector containing the names of any packages needing installation:
# new_pacakges => vector
new_packages <- necessary_packages[!(necessary_packages %in%
                                       installed.packages()[, "Package"])]

# If the vector has more than 0 values, install the new pacakges
# (and their) associated dependencies:
if(length(new_packages) > 0){install.packages(new_packages, dependencies = TRUE)}

# Initialise the packages in the session:
lapply(necessary_packages, require, character.only = TRUE)

# Store a scalar of the link to the data: durl => character scalar
durl <- "http://www.tableau.com/sites/default/files/training/global_superstore.zip"

# Store the path to the temporary directory: tmpdir_path => character scalar
tmpdir_path <- tempdir()

# Store a character scalar denoting the link to the zipped directory
# that is to be created: zip_path => character scalar
zip_path <- paste0(tmpdir_path, "/tableau.zip")

# Store a character scalar denoting the link to the unzipped directory
# that is to be created: unzip_path => character scalar
unzip_path <- paste0(tmpdir_path, "/global_superstore")

# Download the zip file: global_superstore.zip => stdout (zip_path)
download.file(durl, zip_path)

# Unzip the file into the unzip directory: tableau.zip => stdout (global_superstore)
unzip(zipfile = zip_path, exdir = unzip_path)

# Read in the excel file: df => data.frame
df <- read_xls(normalizePath(list.files(unzip_path, full.names = TRUE)))

# Regex the vector names to fit with R convention: names(df) => character vector 
names(df) <- gsub("\\W+", "_", tolower(trimws(names(df), "both")))

# Allocate some memory by creating an empty list the same size as the number of 
# customers: df_list => list
df_list <- vector("list", length(unique(df$customer_id)))

# Split the data.frame into the list by the customer_id: df_list => lis
df_list <- with(df, split(df, customer_id))      

# Sort the data (by date) and test whether or not each customer waited for their 
# order before ordering again: orders_prior_to_delivery => data.frame
orders_prior_to_delivery <- data.frame(do.call("rbind", Map(function(x){
  # Order the data.frame: y => data.frame
  y <- x[order(x$order_date),]
  # Return only the observations where the customer didn't wait: 
  # data.frame => GlobalEnv()
  with(y, y[c(FALSE, 
    apply(data.frame(sapply(order_date[-1], `<`, ship_date[-nrow(y)])), 2, any)),])
}, 
df_list)), row.names = NULL, stringsAsFactors = FALSE)

# Unique customers and orders that were ordered prior to shipping the 
# previous order: cust_orders_prior_to_delivery => data.frame
cust_orders_prior_to_delivery <- 
  unique(orders_prior_to_delivery[,c("order_id", "customer_id")])


来源:https://stackoverflow.com/questions/63975054/how-to-know-customers-who-placed-next-order-before-delivery-receiving-of-earlier

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!