解决spark异常Caused by: java.util.concurrent.TimeoutException: Futures timed out

2019-09-13 10:02:39 | 编辑

运行的spark程序偶尔出错


异常如下


org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange SinglePartition
+- *(4) HashAggregate(keys=[], functions=[partial_count(1)], output=[count#1465L])
  ...
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)
...
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:706)
Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange rangepartitioning(subject_score_model_SUBJECT_NAMEsort#1118 ASC NULLS FIRST, 200)
...
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)
..
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
... 40 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [300 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:136)
at org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:367)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:144)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:140)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:140)
at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:135)
at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenOuter(BroadcastHashJoinExec.scala:280)
at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:103)
at org.apache.spark.sql.execution.CodegenSupport$class.constructDoConsumeFunction(WholeStageCodegenExec.scala:208)
at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:179)
...
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
... 62 more


异常分析

后面可以看到

Caused by: java.util.concurrent.TimeoutException: Futures timed out after [300 seconds]

at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:136)

等异常。


解决方法


1.修改broadcastTimeout超时时间

spark.sql.broadcastTimeout 1000
spark.network.timeout 10000000


这是根据异常能马上想到的方案,但肯定不是好方案,设多久为好,到底什么导致的时间会变长?

所以就有了下面的解决方法


2.多分配cpu和内存

导致任何计算时间长都是因为数量大或者算法复杂,数据量大,不可避免,算法复杂可优化也有限,那么就只能加大计算资源

所以我就同时加大了内存和cpu

spark.executor.cores             10
spark.num.executors              10
spark.executor.instances         10
spark.driver.memory            8g
spark.executor.memory          8g



登录后即可回复 登录 | 注册
    
关注编程学问公众号