How to set flag value based on data that use one-hot-encoding

六月ゝ 毕业季﹏ 提交于 2020-01-24 21:34:10

问题


I have a database consisting of three tables like this:

I want to make a machine learning model in R using that database, and the data I need is like this:

I can use one hot encoding to convert categorical variable from t_pengolahan (such as "Pengupasan, Fermentasi, etc") into attributes. But, how to set flag (yes or no) to the data value based on "result (using SQL query)" data above?


回答1:


We can combine two answers to previous related questions, each of which provides half of the solution; those answers are found here and here:

library(dplyr) ## dplyr and tidyr loaded for wrangling
library(tidyr)
options(dplyr.width = Inf) ## we want to show all columns of result
yes_fun <- function(x) { ## helps with pivot_wider() below
    if ( length(x) > 0 ) {
        return("yes")
    }
}
sql_result %>%
    separate_rows(pengolahan) %>% ## add rows for unique words in pengolahan
    pivot_wider(names_from = pengolahan, ## spread to yes/no indicators
                values_from = pengolahan,
                values_fill = list(pengolahan = "no"),
                values_fn = list(pengolahan = yes_fun))

Data

id_pangan  <- 1:3
kategori   <- c("Daging", "Buah", "Susu")
pengolahan <- c("Penggilingan, Perebusan", "Pengupasan",
                "Fermentasi, Sterilisasi")
batas      <- c(100, 50, 200)
sql_result <- data.frame(id_pangan, kategori, pengolahan, batas)

# A tibble: 3 x 8
  id_pangan kategori batas Penggilingan Perebusan Pengupasan
      <int> <fct>    <dbl> <chr>        <chr>     <chr>     
1         1 Daging     100 yes          yes       no        
2         2 Buah        50 no           no        yes       
3         3 Susu       200 no           no        no        
  Fermentasi Sterilisasi
  <chr>      <chr>      
1 no         no         
2 no         no         
3 yes        yes  



回答2:


This seems unclear to me. What do you mean with "how to set flag (yes or no) to the data value based on "result (using SQL query)" data"? Do you want to convert one of the column to a boolean value? If so you need to specify the decision rule. This may look like this:

SELECT (... other columns),
CASE case_expression
     WHEN when_expression_1 THEN 'yes'
     WHEN when_expression_2 THEN 'no'
     ELSE '' 
END

To help others help you: - which SQL variant do you use? (e.g. would a sqlite solution work for you?) - provide an sql script of your table creation, plus a script to "use one hot encoding to convert categorical variable from t_pengolahan (such as "Pengupasan, Fermentasi, etc") into attributes"



来源:https://stackoverflow.com/questions/59426826/how-to-set-flag-value-based-on-data-that-use-one-hot-encoding

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!