spark解决OutOfMemoryError: Not enough memory to build and broadcast the table to all worker nodes

spark | 2021-05-06 17:13:01

1.分析

spark 异常java.lang.OutOfMemoryError: Not enough memory to build and broadcast the table to all worker nodes

解决问题的感觉和思路很重要,虽然一看就知道是内存溢出,但我跑其他项目有时候也不溢出。

异常给出的建议:spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver memory by setting spark.driver.memory

spark.sql.autoBroadcastJoinThreshold这个配置我肯定不能一棒子打死,spark.driver.memory调了也没有用

 

结合我是多线程执行job,而且封装了调用链,同时并发执行基础计算调用join的概率比较大,导致同一时间broadcast的量大,所以内存溢出

 

2.解决

知道原因就很简单了

对基础dataframe调用一下.persist().checkpoint()

这样做是为了防止多个父dataframe同时调用join

 

3.异常信息

21/05/06 15:36:10 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
21/05/06 15:36:18 WARN memory.TaskMemoryManager: Failed to allocate a page (4194304 bytes), try again.
21/05/06 15:36:41 ERROR computingspark.JsonDataFrameProvider$: Failed when calculating collection student_objective_score :
java.lang.OutOfMemoryError: Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver memory by setting spark.driver.memory to a higher value
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:122)
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:76)
	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withExecutionId$1.apply(SQLExecution.scala:101)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
	at org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:98)
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:75)
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:75)
	at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
	at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
21/05/06 15:36:20 WARN memory.TaskMemoryManager: Failed to allocate a page (4194304 bytes), try again.
21/05/06 15:36:20 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
21/05/06 15:36:18 WARN memory.TaskMemoryManager: Failed to allocate a page (4194304 bytes), try again.
21/05/06 15:37:01 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
21/05/06 15:36:52 ERROR computingspark.JsonDataFrameProvider$: Failed when calculating collection school_difficulty_score :
java.lang.OutOfMemoryError: Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver memory by setting spark.driver.memory to a higher value
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:122)
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:76)
	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withExecutionId$1.apply(SQLExecution.scala:101)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
	at org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:98)
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:75)
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:75)
	at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
	at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
21/05/06 15:37:13 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
21/05/06 15:37:05 WARN memory.TaskMemoryManager: Failed to allocate a page (4194304 bytes), try again.
21/05/06 15:37:55 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
21/05/06 15:37:52 ERROR computingspark.JsonDataFrameProvider$: Failed when calculating collection student_difficulty_score :
java.lang.OutOfMemoryError: Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver memory by setting spark.driver.memory to a higher value
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:122)
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:76)
	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withExecutionId$1.apply(SQLExecution.scala:101)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
	at org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:98)
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:75)
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:75)
	at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
	at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
21/05/06 15:37:46 WARN memory.TaskMemoryManager: Failed to allocate a page (4194304 bytes), try again.
21/05/06 15:37:46 WARN memory.TaskMemoryManager: Failed to allocate a page (4194304 bytes), try again.
21/05/06 15:37:42 WARN memory.TaskMemoryManager: Failed to allocate a page (4194304 bytes), try again.
21/05/06 15:37:39 WARN memory.TaskMemoryManager: Failed to allocate a page (4194304 bytes), try again.
21/05/06 15:37:36 WARN memory.TaskMemoryManager: Failed to allocate a page (4194304 bytes), try again.
21/05/06 15:37:58 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
21/05/06 15:38:00 WARN memory.TaskMemoryManager: Failed to allocate a page (4194304 bytes), try again.
21/05/06 15:38:15 WARN memory.TaskMemoryManager: Failed to allocate a page (4194304 bytes), try again.
21/05/06 15:38:01 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
21/05/06 15:39:00 ERROR computingspark.JsonDataFrameProvider$: Failed when calculating collection school_point_score :
java.lang.OutOfMemoryError: Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver memory by setting spark.driver.memory to a higher value
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:122)
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:76)
	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withExecutionId$1.apply(SQLExecution.scala:101)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
	at org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:98)
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:75)
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:75)
	at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
	at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
21/05/06 15:38:58 WARN memory.TaskMemoryManager: Failed to allocate a page (4194304 bytes), try again.
21/05/06 15:38:57 WARN memory.TaskMemoryManager: Failed to allocate a page (4194304 bytes), try again.
21/05/06 15:38:55 WARN memory.TaskMemoryManager: Failed to allocate a page (4194304 bytes), try again.
21/05/06 15:38:52 WARN memory.TaskMemoryManager: Failed to allocate a page (4194304 bytes), try again.
21/05/06 15:38:50 WARN memory.TaskMemoryManager: Failed to allocate a page (4194304 bytes), try again.
21/05/06 15:38:50 WARN memory.TaskMemoryManager: Failed to allocate a page (4194304 bytes), try again.
21/05/06 15:39:02 ERROR spark.ContextCleaner: Error cleaning broadcast 68
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout
	at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
	at org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:155)
	at org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:321)
	at org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45)
	at org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:66)
	at org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:238)
	at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$1.apply(ContextCleaner.scala:194)
	at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$1.apply(ContextCleaner.scala:185)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:185)
	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1350)
	at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178)
	at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
	at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
	at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
	... 

 

 

 

 

 

登录后即可回复 登录 | 注册
    
关注编程学问公众号