我是启动了 spark thriftserver,然后通过客户端连接hive,创建表成功,然后插入数据的时候出现异常。
1.在spark/sbin启动beeline连接hive
./beeline -u jdbc:hive2://master:10000
2.创建表
CREATE TABLE t2 (id INT, name STRING) PARTITIONED BY (country STRING, state STRING) CLUSTERED BY (id) INTO 8 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional'='true');
使用BUCKETS是为了支持 update delete能够修改和删除hive表数据
3.插入数据
beeline> INSERT INTO TABLE t2 PARTITION (country, state) VALUES (5,'刘','DD','DD');
出现异常
Error: org.apache.spark.sql.AnalysisException: Output Hive table `default`.`t2` is bucketed but Spark currently does NOT populate bucketed output which is compatible with Hive.; (state=,code=0)
4.解决方法
beeline> set hive.enforce.bucketing=false; beeline> set hive.enforce.sorting=false;
然后再执行插入语句就能成功。
5.修改hive-site.xml
beeline set变量只是当前临时测试,要一直生效,就要修改hive-site.xml。
<property> <name>hive.enforce.bucketing</name> <value>false</value> </property> <property> <name>hive.enforce.sorting</name> <value>false</value> </property>