spark on yarn 异常 spark_shuffle does not exist

2019-09-13 10:02:39 | 编辑

1.spark 应用场景

我的是Spark Streaming程序,spark on yarn 模式

我配置了 spark动态资源分配

spark-defaults.conf

spark.executor.cores              10
spark.dynamicAllocation.enabled               true
spark.shuffle.service.enabled                 true
spark.dynamicAllocation.minExecutors        1
spark.dynamicAllocation.maxExecutors        8
spark.dynamicAllocation.initialExecutors    1


2.异常信息

启动之后不久Streaming程序久挂掉了,在yarn的8088界面,进到application界面

Diagnostics:
reason: Max number of executor failures (16) reached


然后我还是把yarn log日志下载下来了

yarn logs applicationId application_1560>application_1560

然后有大量同样的异常信息:

19/06/12 11:03:18 ERROR YarnAllocator: Failed to launch executor 23 on container container_1558494893857_0406_02_000031
org.apache.spark.SparkException: Exception while starting container container_1558494893857_0406_02_000031 on host slave2
at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:125)
at org.apache.spark.deploy.yarn.ExecutorRunnable.run(ExecutorRunnable.scala:65)
at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$runAllocatedContainers$1$$anon$1.run(YarnAllocator.scala:534)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist
at sun.reflect.GeneratedConstructorAccessor24.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
at org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:205)
at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:122)
... 5 more


3.解决方法

所有节点都要配置auxService

  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle,spark_shuffle</value>
  </property>
  <property>
     <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
     <value>org.apache.spark.network.yarn.YarnShuffleService</value>
  </property>
  <property>
     <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
     <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>


然后所有节点都要把spark-2.3.0-yarn-shuffle.jar从spark复制到hadoop

cp /opt/hadoop/spark-2.3.0-bin-hadoop2.7/yarn/spark-2.3.0-yarn-shuffle.jar /opt/hadoop/hadoop-2.7.7/share/hadoop/yarn


注意:记住是所有节点都要配置!

登录后即可回复 登录 | 注册
    
关注编程学问公众号