spark on hive 异常 `HiveFileFormat` doesn't match `ParquetFileFormat`

spark | 2020-03-06 16:25:47

我曾经配置过parquet格式，再数据写入时就使用parquet hive on spark ParquetDecodingException 异常解决

然而我把表复制（create replace）等操作后就不时parquet格式了，当我把spark dataset往已经存在的表中写入数据时，出现下面的异常

这个异常是我通过yarn logs命令查看到的。

异常信息

org.apache.spark.sql.AnalysisException: The format of the existing table datacenter.subject_total_score is `HiveFileFormat`. It doesn't match the specified format `ParquetFileFormat`.;
at org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:117)
at org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:76)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256)
at org.apache.spark.sql.execution.datasources.PreprocessTableCreation.apply(rules.scala:76)
at org.apache.spark.sql.execution.datasources.PreprocessTableCreation.apply(rules.scala:72)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.ArrayBuffer.foldLeft(ArrayBuffer.scala:48)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:123)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:117)

解决方法：

可能你的hive表经过各种操作有的是HiveFileFormat，有的是ParquetFileFormat，所以在写入数据时需要指定格式。

dataset.write.mode(SaveMode.Append).format("hive").saveAsTable(table)

登录后即可回复登录 | 注册

spark on hive 异常 `hivefileformat` doesn t match `parquetfileformat`spark操作hive orc transactional事务表异常解决spark hive插入数据异常spark currently does not populate bucketed output jdbc连接hive spark thriftserver异常unable to move source java jdbc通过spark连接hive 异常required field client protocol is unset spark hive 异常version information not found in metastore hive on spark环境搭建官方源码编译方式 hive on spark异常failed to create spark client for spark session解决过程 hive on spark parquetdecodingexception 异常解决 spark submit yarn提交任务异常error initializing sparkcontext spark hive 元数据异常 filenotfoundexception spark 异常 missing an output location for shuffle spark on yarn 异常 spark shuffle does not exist 解决spark异常caused by java.util.concurrent.timeoutexception futures timed out spark 异常 timeoutexception futures timed out after 1000 seconds spark异常java.lang.nosuchmethoderror scala.predef$.refarrayops spark从oracle导入数据到hive 如何解决spark hive 权限不够的问题 spark hive 异常 could not connect to meta store using any of the uris provided spark运行异常Failed writing driver logs to dfs