我用的spark on yarn 集群模式,通过spark-submit 提交命令:
./spark-submit --class net.itxw.Test --master yarn --deploy-mode cluster hdfs://hadoopMaster:9000/tmp/test.jar /tmp/config.json hdfs://hadoopMaster:9000/
报错详细如下:
2018-12-04 11:24:23 INFO Client:54 - Application report for application_1543453363145_0003 (state: FAILED) 2018-12-04 11:24:23 INFO Client:54 - client token: N/A diagnostics: Application application_1543453363145_0003 failed 2 times due to AM Container for appattempt_1543453363145_0003_000002 exited with exitCode: 13 For more detailed output, check application tracking page:http://master:8088/cluster/app/application_1543453363145_0003Then, click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_1543453363145_0003_02_000001 Exit code: 13 Stack trace: ExitCodeException exitCode=13: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 13 Failing this attempt. Failing the application. ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: root.root start time: 1543893639052 final status: FAILED tracking URL: http://master:8088/cluster/app/application_1543453363145_0003 user: root Exception in thread "main" org.apache.spark.SparkException: Application application_1543453363145_0003 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1159) at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1518) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
错误分析:
exitCode对应不同的错误,exitCode=13是由于集群模式冲突
哪个半吊子程序员以前用standalone模式,还把创建SparkSession用了.master(MASTER)或者.setMaster(MASTER)。而我现在要用的时yarn模式。
编程要养成好习惯,不要再程序中指定master,无论如何也要再spark-default或者spark-submit命令行上指定啊,方便很多也不容易出错。
解决方法:
去掉程序中的指定master的代码中的.master(MASTER)
SparkSession.builder().appName(APP_NAME).master(MASTER)