spark rdd写入数据到hbase NullPointerException异常

spark | 2019-09-13 10:02:39

1.异常信息

我使用saveAsNewAPIHadoopDataset将rdd的数据写入到hbase

代码如下

      val sc=spark.sparkContext
    sc.hadoopConfiguration.set("hbase.zookeeper.quorum ","master,slave1,slave2,slave3,slave4")
    sc.hadoopConfiguration.set(TableOutputFormat.OUTPUT_TABLE,"hb_SUBJECT_TOTAL_SCORE_MODEL")
    lazy val job = Job.getInstance(sc.hadoopConfiguration)
    job.setOutputKeyClass(classOf[ImmutableBytesWritable])
    job.setOutputValueClass(classOf[Result])
    job.setOutputFormatClass(classOf[TableOutputFormat[ImmutableBytesWritable]])
    。。。
      rdd.saveAsNewAPIHadoopDataset(job.getConfiguration)

出现异常

scala>     data.saveAsNewAPIHadoopDataset(job.getConfiguration)
java.lang.NullPointerException
  at org.apache.hadoop.hbase.security.UserProvider.instantiate(UserProvider.java:122)
  at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:214)
  at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)
  at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.checkOutputSpecs(TableOutputFormat.java:177)
  at org.apache.spark.internal.io.HadoopMapReduceWriteConfigUtil.assertConf(SparkHadoopWriter.scala:387)
  at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:71)
  at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1083)
  at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081)
  at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
  at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1081)
  ... 63 elided


2.解决方法

网上说要将hbase-site.xml放入到代码中,还有说client模式不可以,cluster模式可以,这些解决方案都没什么用,而且,这个错误,不是所有人能遇到。

应该和版本有关系,这就是个bug,最后我的解决方法,不验证输出参数,添加 spark.hadoop.validateOutputSpecs 参数为false


var spark = SparkSession.builder().appName("testSpark")
        .config("spark.some.config.option", "some-value")
        .config("spark.hadoop.validateOutputSpecs", false)
        .enableHiveSupport()
        .getOrCreate()




登录后即可回复 登录 | 注册
    
关注编程学问公众号