number of unique values sparklyr

前端 未结 2 1004
名媛妹妹
名媛妹妹 2020-12-20 02:13

the following example describes how you can\'t calculate the number of distinct values without aggregating the rows using dplyr with sparklyr.

is there a work aroun

2条回答
  •  一整个雨季
    2020-12-20 02:15

    I want to link in this thread which answers this for sparklyr.

    Using approx_count_distinct I think is the best solution. In my experience, dbplyr doesn't translate this function when using a window so it is better to write the SQL yourself.

    mtcars_spk <- copy_to(sc, mtcars,"mtcars_spk",overwrite = TRUE)
    mtcars_spk2 <- mtcars_spk %>%
                    dplyr::mutate(test = paste0(gear, " ",carb)) %>%
                    dplyr::mutate(discnt = sql("approx_count_distinct(test) OVER (PARTITION BY cyl)"))
    

    This thread approaches the problem more generally and discusses CountDistinct v.s. approxCountDistinct

提交回复
热议问题