spark on hive 异常 `HiveFileFormat` doesn't match `ParquetFileFormat`

spark | 2020-03-06 16:25:47

我曾经配置过parquet格式,再数据写入时就使用parquet hive on spark ParquetDecodingException 异常 解决

然而我把表复制(create replace)等操作后就不时parquet格式了,当我把spark dataset往已经存在的表中写入数据时,出现下面的异常

这个异常是我通过yarn logs命令查看到的。

异常信息

org.apache.spark.sql.AnalysisException: The format of the existing table datacenter.subject_total_score is `HiveFileFormat`. It doesn't match the specified format `ParquetFileFormat`.;
at org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:117)
at org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:76)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256)
at org.apache.spark.sql.execution.datasources.PreprocessTableCreation.apply(rules.scala:76)
at org.apache.spark.sql.execution.datasources.PreprocessTableCreation.apply(rules.scala:72)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.ArrayBuffer.foldLeft(ArrayBuffer.scala:48)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:123)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:117)


解决方法:

可能你的hive表经过各种操作有的是HiveFileFormat,有的是ParquetFileFormat,所以在写入数据时需要指定格式。

dataset.write.mode(SaveMode.Append).format("hive").saveAsTable(table)



登录后即可回复 登录 | 注册
    
关注编程学问公众号