java - Launching Apache Spark SQL jobs from multi-threaded driver

Question

Welcome To Ask or Share your Answers For Others

java - Launching Apache Spark SQL jobs from multi-threaded driver

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

I was wanting to pull data from about 1500 remote Oracle tables with Spark, and I want to have a multi-threaded application that picks up a table per thread or maybe 10 tables per thread and launches a spark job to read from their respective tables.

From official spark site https://spark.apache.org/docs/latest/job-scheduling.html it's clear that this can work...

...cluster managers that Spark runs on provide facilities for scheduling across applications. Second, within each Spark application, multiple “jobs” (Spark actions) may be running concurrently if they were submitted by different threads. This is common if your application is serving requests over the network. Spark includes a fair scheduler to schedule resources within each SparkContext.

However you might have noticed in this SO post Concurrent job Execution in Spark that there was no accepted answer on this similar question and the most upvoted answer starts with

This is not really in the spirit of Spark

Everyone knows it's not in the "spirit" of Spark
Who cares what is the spirit of Spark? That doesn't actually mean anything

Has anyone gotten something like this to work before? Did you have to do anything special? Just wanted some pointers before I wasted a lot of work hours prototyping. I would really appreciate any help on this!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

295 views

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:31:57+0000

The spark context is thread safe, so it's possible to call it from many threads in parallel. (I am doing it in production)

One thing to be aware of, is to limit the number of thread you have running, because:
1. the executor memory is shared between all threads, and you might get OOM or constantly swap in and out memory from the cache
2. the cpu is limited, so having more tasks than core won't have any improvement

Categories

java - Launching Apache Spark SQL jobs from multi-threaded driver

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags