在spark中创建hbase表

spark | 2019-09-13 10:02:39

需要在spark中将数据保存到hbase，那就要先用spark在hbase创建表，和连接mysql数据库执行sql语句不一样，

直接上spark代码：

//spark对象
var spark = SparkSession.builder().appName("testSpark")
      .config("spark.some.config.option", "some-value")
      .config("spark.hadoop.validateOutputSpecs", false)
      .enableHiveSupport()
      .getOrCreate()
    val sc=spark.sparkContext
    
    
/** 另外一中创建admin的方式
    val hbaseConf = HBaseConfiguration.create()
    hbaseConf.set(HConstants.ZOOKEEPER_QUORUM, "master,slave1,slave2,slave3,slave4")
hbaseConf.set(TableOutputFormat.OUTPUT_TABLE,"hb_itxw")
    hbaseConf.set(TableInputFormat.INPUT_TABLE, "hb_itxw")
    val hbaseConn = ConnectionFactory.createConnection(hbaseConf)
    val admin = hbaseConn.getAdmin
**/


//hbase 属性
    sc.hadoopConfiguration.set(HConstants.ZOOKEEPER_QUORUM,"master,slave1,slave2,slave3,slave4")
    sc.hadoopConfiguration.set(TableOutputFormat.OUTPUT_TABLE,"hb_itxw")
    sc.hadoopConfiguration.set(TableInputFormat.INPUT_TABLE, "hb_itxw")
    
    
//连接hbase
    val hbaseConn = ConnectionFactory.createConnection(sc.hadoopConfiguration)
    val admin = hbaseConn.getAdmin
    
    
//如果不存在就创建表
    if (!admin.tableExists(TableName.valueOf("hb_itxw"))) {
      val desc = new HTableDescriptor(TableName.valueOf("hb_itxw"))
  //指定列簇 不需要创建列，列式存储不需要创建列
      val hcd = new HColumnDescriptor("cf")
      desc.addFamily(hcd)
      admin.createTable(desc)
    }

登录后即可回复登录 | 注册

spark读取phoenix hbase table表数据到dataframe的三种方式比较 hbase phoenix整合mybatis druiddatasource spark 读取phoenix hbase table表到 dataframe的方式 jdbc连接hive spark thriftserver异常unable to move source spark异常 could not locate executable null bin winutils.exe in the hadoop binaries spark计算原理和流程 linux hadoop、hbase、hive、spark大数据分布式集群环境搭建 spark gc 垃圾回收优化解决spark shell 写入hbase 异常job in state define instead of running spark rdd写入数据到hbase nullpointerexception异常在spark中创建hbase表 spark dataset读写 hbase 案例代码 spark读取hbase到rdd并转换为dataset 案例如何解决spark hive 权限不够的问题 spark2.3.0 操作 phoenix4.7 hbase1.1.2 数据 java连接phoenix hbase 超时的问题 Spark 教程 mysql 创建空间列 spark hive Can not create the managed table('`xxx`'). The associated location('xxx') already exists Spark如何删除rdd checkpoint