我曾经配置过parquet格式,再数据写入时就使用parquet
然而我把表复制(create replace)等操作后就不时parquet格式了,当我把spark dataset往已经存在的表中写入数据时,出现下面的异常
这个异常是我通过yarn logs命令查看到的。
异常信息
org.apache.spark.sql.AnalysisException: The format of the existing table datacenter.subject_total_score is `HiveFileFormat`. It doesn't match the specified format `ParquetFileFormat`.; at org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:117) at org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:76) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256) at org.apache.spark.sql.execution.datasources.PreprocessTableCreation.apply(rules.scala:76) at org.apache.spark.sql.execution.datasources.PreprocessTableCreation.apply(rules.scala:72) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84) at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) at scala.collection.mutable.ArrayBuffer.foldLeft(ArrayBuffer.scala:48) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76) at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:123) at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:117)
解决方法:
可能你的hive表经过各种操作有的是HiveFileFormat,有的是ParquetFileFormat,所以在写入数据时需要指定格式。
dataset.write.mode(SaveMode.Append).format("hive").saveAsTable(table)