r - SparklyR separate one Spark DataFrame column into two columns

Question

Ask a Question

Welcome To Ask or Share your Answers For Others

r - SparklyR separate one Spark DataFrame column into two columns

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

I have a dataframe containing a column named COL which is structured in this way:

VALUE1###VALUE2

The following code is working

library(sparklyr)
library(tidyr)
library(dplyr)
mParams<- collect(filter(input_DF, TYPE == ('MIN')))
mParams<- separate(mParams, COL, c('col1','col2'), '\###', remove=FALSE)

If I remove the collect, I get this error:

Error in UseMethod("separate_") : 
  no applicable method for 'separate_' applied to an object of class "c('tbl_spark', 'tbl_sql', 'tbl_lazy', 'tbl')"

Is there any alternative to achieve what I want, but without collecting everything on my spark driver?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

759 views

1 Answer

深蓝 · Answer 1 · 2021-10-23T20:07:17+0000

You can use ft_regex_tokenizer followed by sdf_separate_column.

ft_regex_tokenizer will split a column into a vector type, based on a regex. sdf_separate_column will split this into multiple columns.

mydf %>% 
    ft_regex_tokenizer(input_col="mycolumn", output_col="mycolumnSplit", pattern=";") %>% 
    sdf_separate_column("mycolumnSplit", into=c("column1", "column2")

UPDATE: in recent versions of sparklyr, the parameters input.col and output.col have been renamed to input_col and output_col, respectively.

Categories

r - SparklyR separate one Spark DataFrame column into two columns

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags