I have some data in a database, and I want to work with it in Spark, using sparklyr.
I can use a DBI-based package to import the data from the database into R
dbconn <- dbConnect(<some connection args>)
data_in_r <- dbReadTable(dbconn, "a table")
then copy the data from R to Spark using
sconn <- spark_connect(<some connection args>)
data_ptr <- copy_to(sconn, data_in_r)
Copying twice is slow for big datasets.
How can I copy data directly from the database into Spark?
sparklyr has several spark_read_*()
functions for import, but nothing database related. sdf_import()
looks like a possibility, but it isn't clear how to use it in this context.