jdbc连接hive spark thriftserver异常Unable to move source

spark | 2019-09-13 10:02:39

java jdbc连接hive spark thriftserver异常HiveException: Unable to move source

启动了spark thriftserver，然后java 通过jdbc 插入数据出现异常：

error
java.sql.SQLException: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://master:9000/user/hive/warehouse/datacenter.db/test/.hive-staging_hive_2019-01-21_09-30-22_299_3322687924153036286-9/-ext-10000/part-00000-82fd3ed3-2734-4044-a779-9405d97caeaa-c000 to destination hdfs://master:9000/user/hive/warehouse/datacenter.db/test/part-00000-82fd3ed3-2734-4044-a779-9405d97caeaa-c000;
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296)
at org.apache.hive.jdbc.HiveStatement.executeUpdate(HiveStatement.java:406)
at net.itxw.example.HiveTest.run(HiveTest.java:24)
at net.itxw.example.HiveTest.main(HiveTest.java:47)

然后去集群查看日志：HiveThriftServer2-1-master.out

org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://master:9000/user/hive/warehouse/datacenter.db/test/.hive-staging_hive_2019-01-21_09-31-01_327_3623876701993665565-10/-ext-10000/part-00000-d7c2d2de-13cf-4cb1-9c56-5842bec7dacf-c000 to destination hdfs://master:9000/user/hive/warehouse/datacenter.db/test/part-00000-d7c2d2de-13cf-4cb1-9c56-5842bec7dacf-c000;
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
at org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:827)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.processInsert(InsertIntoHiveTable.scala:260)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:99)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:115)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3253)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:190)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:232)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:175)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:185)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://master:9000/user/hive/warehouse/datacenter.db/test/.hive-staging_hive_2019-01-21_09-31-01_327_3623876701993665565-10/-ext-10000/part-00000-d7c2d2de-13cf-4cb1-9c56-5842bec7dacf-c000 to destination hdfs://master:9000/user/hive/warehouse/datacenter.db/test/part-00000-d7c2d2de-13cf-4cb1-9c56-5842bec7dacf-c000
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1645)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.sql.hive.client.Shim_v0_14.loadTable(HiveShim.scala:847)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply$mcV$sp(HiveClientImpl.scala:757)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:757)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:757)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:272)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:210)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:209)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:255)
at org.apache.spark.sql.hive.client.HiveClientImpl.loadTable(HiveClientImpl.scala:756)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply$mcV$sp(HiveExternalCatalog.scala:829)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:827)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:827)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
... 27 more
Caused by: java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
at org.apache.hadoop.hdfs.DFSClient.getEZForPath(DFSClient.java:3288)
at org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:2093)
at org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:289)
at org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1221)
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2607)
... 46 more
2019-01-21 09:31:01 ERROR SparkExecuteStatementOperation:179 - Error running hive query: 
org.apache.hive.service.cli.HiveSQLException: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://master:9000/user/hive/warehouse/datacenter.db/test/.hive-staging_hive_2019-01-21_09-31-01_327_3623876701993665565-10/-ext-10000/part-00000-d7c2d2de-13cf-4cb1-9c56-5842bec7dacf-c000 to destination hdfs://master:9000/user/hive/warehouse/datacenter.db/test/part-00000-d7c2d2de-13cf-4cb1-9c56-5842bec7dacf-c000;
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:269)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:175)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:185)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2019-01-21 09:31:02 INFO  ThriftCLIService:107 - Session disconnected without closing properly, close it now

解决方法：

这个异常很奇怪，每次重启spark thriftserver的第一次，jdbc连接hive都能正常执行 insert语句,但第二次就报上面的异常，

最后在apach官网也找到一个同样的帖子https://issues.apache.org/jira/browse/SPARK-21725

在spark/conf/hive-site.xml添加配置：

  <property>
      <name>fs.hdfs.impl.disable.cache</name>
      <value>true</value>
  </property>

这样即可解决。

原因：spark和hdfs使用的是同样一个底层实现的api。执行完一次数据插入，jdbc connection.close()关闭连接，也把hdfs的Filesystem连接关了。此时一道直接把thriftserver的hdfs Filesystem连接也关了，那也就是为什么我启动thriftserver第一次能插入成功，而第二次thriftserver的日志就报错Filesystem closed，Filesystem 已经关闭了。

登录后即可回复登录 | 注册

spark on hive 异常 `hivefileformat` doesn t match `parquetfileformat`spark操作hive orc transactional事务表异常解决spark hive插入数据异常spark currently does not populate bucketed output jdbc连接hive spark thriftserver异常unable to move source java jdbc通过spark连接hive 异常required field client protocol is unset spark hive 异常version information not found in metastore hive on spark异常failed to create spark client for spark session解决过程 hive on spark parquetdecodingexception 异常解决 java连接hive数据仓库 hive on spark集群环境搭建 spark dataset写入hive表 hive on spark 匹配版本和官方文档 spark hive 元数据异常 filenotfoundexception jdbc连接phoenix hbase 异常the node /hbase is not in zookeeper spark 异常 spark conf / hadoop conf bad substitution spark hive 异常 could not connect to meta store using any of the uris provided java jdbc连接phoenix hbase异常clientpath null serverpath null finished false spark操作hive分区表 spark操作hive分区表源码bug排查 R语言 RJDBC连接mysql,oracle,DB2,hive等数据库