异常信息:
spark计算写hive表时出现异常
Exception in thread "main" org.apache.spark.sql.AnalysisException: Can not create the managed table('`original_exam_question_score`'). The associated location('hdfs://master122:8020/opt/cdh/hive/warehouse/h_431000_923a0d14be0244eb8c509ce0a57684bb.db/original_exam_question_score') already exists.;
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.validateTableLocation(SessionCatalog.scala:331)
at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:170)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
异常原因:
可能是之前有个spark任务写这个表的时候强制把任务停了,每次写表都执行删除表,hive源数据库是没有数据了,但是在hdfs还是有文件,所以出现了这个异常。
如果出现以下情况,则可能出现此问题:
- 正在进行写操作时,将终止群集。
- 发生临时网络问题。
- 作业中断。
解决方法1:
将属性设置 spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation 为 true 。 表示删除 _STARTED 目录,并将进程返回到原始状态。
spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")
或者配置在spark-default.conf里面:
spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation true
亲测有效。
解决方法2:
直接在hdfs上删除异常信息中的这个目录,我最后也是菜取的这中方法。