spark hive Can not create the managed table('`xxx`'). The associated location('xxx') already exists

2020-12-07 09:28:01 | 编辑

异常信息:

spark计算写hive表时出现异常

Exception in thread "main" org.apache.spark.sql.AnalysisException: Can not create the managed table('`original_exam_question_score`'). The associated location('hdfs://master122:8020/opt/cdh/hive/warehouse/h_431000_923a0d14be0244eb8c509ce0a57684bb.db/original_exam_question_score') already exists.;
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.validateTableLocation(SessionCatalog.scala:331)
	at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:170)
	at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
	at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
	at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)

 

异常原因:

可能是之前有个spark任务写这个表的时候强制把任务停了,每次写表都执行删除表,hive源数据库是没有数据了,但是在hdfs还是有文件,所以出现了这个异常。

如果出现以下情况,则可能出现此问题:

  • 正在进行写操作时,将终止群集。
  • 发生临时网络问题。
  • 作业中断。

 

解决方法1:

将属性设置 spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation 为 true 。 表示删除 _STARTED 目录,并将进程返回到原始状态。

spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")

或者配置在spark-default.conf里面:

spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation true

亲测有效。

 

解决方法2:

直接在hdfs上删除异常信息中的这个目录,我最后也是菜取的这中方法。

登录后即可回复 登录 | 注册
    
关注编程学问公众号