1.spark 应用场景
我的是Spark Streaming程序,spark on yarn 模式
我配置了 spark动态资源分配
spark-defaults.conf
spark.executor.cores 10 spark.dynamicAllocation.enabled true spark.shuffle.service.enabled true spark.dynamicAllocation.minExecutors 1 spark.dynamicAllocation.maxExecutors 8 spark.dynamicAllocation.initialExecutors 1
2.异常信息
启动之后不久Streaming程序久挂掉了,在yarn的8088界面,进到application界面
Diagnostics: reason: Max number of executor failures (16) reached
然后我还是把yarn log日志下载下来了
yarn logs applicationId application_1560>application_1560
然后有大量同样的异常信息:
19/06/12 11:03:18 ERROR YarnAllocator: Failed to launch executor 23 on container container_1558494893857_0406_02_000031 org.apache.spark.SparkException: Exception while starting container container_1558494893857_0406_02_000031 on host slave2 at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:125) at org.apache.spark.deploy.yarn.ExecutorRunnable.run(ExecutorRunnable.scala:65) at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$runAllocatedContainers$1$$anon$1.run(YarnAllocator.scala:534) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist at sun.reflect.GeneratedConstructorAccessor24.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:205) at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:122) ... 5 more
3.解决方法
所有节点都要配置auxService
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle,spark_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.spark_shuffle.class</name> <value>org.apache.spark.network.yarn.YarnShuffleService</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>
然后所有节点都要把spark-2.3.0-yarn-shuffle.jar从spark复制到hadoop
cp /opt/hadoop/spark-2.3.0-bin-hadoop2.7/yarn/spark-2.3.0-yarn-shuffle.jar /opt/hadoop/hadoop-2.7.7/share/hadoop/yarn
注意:记住是所有节点都要配置!