Is there any alternative for df[100, c("column")]
in scala spark data frames. I want to select specific row from a column of spark data frame.
for example 100th
row in above R equivalent code
Is there any alternative for df[100, c("column")]
in scala spark data frames. I want to select specific row from a column of spark data frame.
for example 100th
row in above R equivalent code
Firstly, you must understand that DataFrames
are distributed, that means you can't access them in a typical procedural way, you must do an analysis first. Although, you are asking about Scala
I suggest you to read the Pyspark Documentation, because it has more examples than any of the other documentations.
However, continuing with my explanation, I would use some methods of the RDD
API cause all DataFrame
s have one RDD
as attribute. Please, see my example bellow, and notice how I take the 2nd record.
df = sqlContext.createDataFrame([("a", 1), ("b", 2), ("c", 3)], ["letter", "name"])
myIndex = 1
values = (df.rdd.zipWithIndex()
.filter(lambda ((l, v), i): i == myIndex)
.map(lambda ((l,v), i): (l, v))
.collect())
print(values[0])
# (u'b', 2)
Hopefully, someone gives another solution with fewer steps.