1.异常信息
我使用saveAsNewAPIHadoopDataset将rdd的数据写入到hbase
代码如下
val sc=spark.sparkContext sc.hadoopConfiguration.set("hbase.zookeeper.quorum ","master,slave1,slave2,slave3,slave4") sc.hadoopConfiguration.set(TableOutputFormat.OUTPUT_TABLE,"hb_SUBJECT_TOTAL_SCORE_MODEL") lazy val job = Job.getInstance(sc.hadoopConfiguration) job.setOutputKeyClass(classOf[ImmutableBytesWritable]) job.setOutputValueClass(classOf[Result]) job.setOutputFormatClass(classOf[TableOutputFormat[ImmutableBytesWritable]]) 。。。 rdd.saveAsNewAPIHadoopDataset(job.getConfiguration)
出现异常
scala> data.saveAsNewAPIHadoopDataset(job.getConfiguration) java.lang.NullPointerException at org.apache.hadoop.hbase.security.UserProvider.instantiate(UserProvider.java:122) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:214) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119) at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.checkOutputSpecs(TableOutputFormat.java:177) at org.apache.spark.internal.io.HadoopMapReduceWriteConfigUtil.assertConf(SparkHadoopWriter.scala:387) at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:71) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1083) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1081) ... 63 elided
2.解决方法
网上说要将hbase-site.xml放入到代码中,还有说client模式不可以,cluster模式可以,这些解决方案都没什么用,而且,这个错误,不是所有人能遇到。
应该和版本有关系,这就是个bug,最后我的解决方法,不验证输出参数,添加 spark.hadoop.validateOutputSpecs 参数为false
var spark = SparkSession.builder().appName("testSpark") .config("spark.some.config.option", "some-value") .config("spark.hadoop.validateOutputSpecs", false) .enableHiveSupport() .getOrCreate()