java - Spark NullPointerException with saveAsTextFile

Question

Ask a Question

Welcome To Ask or Share your Answers For Others

java - Spark NullPointerException with saveAsTextFile

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

I'm getting a NPE when trying to coalesce and save out an RDD.

Code works locally, and works on the cluster in the scala shell, but throws the error when submitting it as a job to the cluster.

I've tried printing out using a take() to see if the rdd contains some null data, but this throws the same error - pain because it works ok in the shell.

I'm saving out to HDFS and have the full url path in the variable - model saves with this method fine during MLLib training phase.

Any ideas much appreciated!

Scala Code (Whole Prediction Func):

//Load the Random Forest
val rfModel = RandomForestModel.load(sc, modelPath)

//Make the predictions - Here the label is the unique ID of the point
val rfPreds = labDistVect.map(p => (p.label, rfModel.predict(p.features)))


//Collect and save
println("Done Modelling, now saving preds")
val outP = rfPreds.coalesce(1,true).saveAsTextFile(outPreds)
println("Done Modelling, now saving coords")
val outC = coords.coalesce(1,true).saveAsTextFile(outCoords)

Stack Trace:

    Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 40, XX.XX.XX.XX): java.lang.NullPointerException
    at GeoDistPredict1$$anonfun$38.apply(GeoDist1.scala:340)
    at GeoDistPredict1$$anonfun$38.apply(GeoDist1.scala:340)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

577 views

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:28:46+0000

Spark operations are divided into lazy transformations and actions.

A lazy transformation on a RDD is performed when an action is called on the RDD.
So when you execute a transformation it is just stored as an operation to be performed.

The saveAsTextFile method is an action meanwhile the map operation is transformation.

If there is any issue on a transformation step, it will show as an issue at the action level step in which the transformation was called.

So you might have an issue during the map operation in which there is a null value in some of the field which is most probably causing your NPE issue.

Categories

java - Spark NullPointerException with saveAsTextFile

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags