Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I've been encountering various issues in some Spark LDA topic modeling (mainly disassociation errors at seemingly random intervals) I've been running, which I think mainly have to do with insufficient memory allocation on my executors. This would seem to be related to problematic automatic cluster configuration. My latest attempt uses n1-standard-8 machines (8 cores, 30GB RAM) for both the master and worker nodes (6 workers, so 48 total cores).

But when I look at /etc/spark/conf/spark-defaults.conf I see this:

spark.master yarn-client
spark.eventLog.enabled true
spark.eventLog.dir hdfs://cluster-3-m/user/spark/eventlog

# Dynamic allocation on YARN
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.minExecutors 1
spark.dynamicAllocation.initialExecutors 100000
spark.dynamicAllocation.maxExecutors 100000
spark.shuffle.service.enabled true
spark.scheduler.minRegisteredResourcesRatio 0.0

spark.yarn.historyServer.address cluster-3-m:18080
spark.history.fs.logDirectory hdfs://cluster-3-m/user/spark/eventlog

spark.executor.cores 4
spark.executor.memory 9310m
spark.yarn.executor.memoryOverhead 930

# Overkill
spark.yarn.am.memory 9310m
spark.yarn.am.memoryOverhead 930

spark.driver.memory 7556m
spark.driver.maxResultSize 3778m
spark.akka.frameSize 512

# Add ALPN for Bigtable
spark.driver.extraJavaOptions -Xbootclasspath/p:/usr/local/share/google/alpn/alpn-boot-8.1.3.v20150130.jar
spark.executor.extraJavaOptions -Xbootclasspath/p:/usr/local/share/google/alpn/alpn-boot-8.1.3.v20150130.jar

But these values don't make much sense. Why use only 4/8 executor cores? And only 9.3 / 30GB RAM? My impression was that all this config was supposed to be handled automatically, but even my attempts at manual tweaking aren't getting me anywhere.

For instance, I tried launching the shell with:

spark-shell --conf spark.executor.cores=8 --conf spark.executor.memory=24g

But then this failed with

java.lang.IllegalArgumentException: Required executor memory (24576+930 MB) is above the max threshold (22528 MB) of this cluster! Please increase the value of 'yarn.scheduler.maximum-allocation-mb'.

I tried changing the associated value in /etc/hadoop/conf/yarn-site.xml, to no effect. Even when I try a different cluster setup (e.g. using executors with 60+ GB RAM) I end up with the same problem. For some reason the max threshold remains at 22528MB.

Is there something I'm doing wrong here, or is this is a problem with Google's automatic configuration?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
1.1k views
Welcome To Ask or Share your Answers For Others

1 Answer

Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...