By default, spark_read_jdbc()
reads an entire database table into Spark. I\'ve used the following syntax to create these connections.
library(sp
You can replace dbtable
with query:
db_tbl <- sc %>%
spark_read_jdbc(sc = .,
name = "table_name",
options = list(url = "jdbc:mysql://localhost:3306/schema_name",
user = "root",
password = "password",
dbtable = "(SELECT * FROM table_name WHERE field > 1) as my_query"))
but with simple condition like this Spark should push it automatically when you filter:
db_tbl %>% filter(field > 1)
Just make sure to set:
memory = FALSE
in spark_read_jdbc
.