在前面已经介绍过一步一步搭建hadoop,再集成spark,再集成hive,也研究过hive on spark,最后是用的模式是使用hdfs,加上yarn进行资源管理spark,在spark中使用hive的模式,请参考:
,这次会在此前搭建好的 1master 4slave子节点节点(64G 45核)基础上继续安装 phoenix on hbase,但依然还是会从hadoop 安装的基础开始讲解。之所以要phoenix,是因为phoenix为hbase提供了sql支持,还有二级索引,这给兄弟们提供了更方便的使用方法!1.hadoop集群环境准备
1.0 版本选择
参考:
我选择的:jdk1.8.0,hadoop-2.7.7,hbase-1.4.9,apache-phoenix-4.14.1-HBase-1.4-bin
1.1 服务器
5台linux centos 7 服务器
cpu:8x6 核 内存:64G 硬盘:500G
1.2 设置节点hostname
参考:
hostname分别为:master,slave1,slave2,slave3,slave4
1.3 服务器slave和master之间免密设置
参考:
1.4 集群服务器日期时间同步
参考:
1.5 各节点安装java jdk
参考:
1.6 所有节点关闭防火墙
参考:
1.7 配置hosts
配置/etc/host ip对应别名
192.168.1.0 master
192.168.1.0 slave1
.....
2.hadoop 集群环境配置安装
2.1 下载hadoop
https://archive.apache.org/dist/hadoop/common/
我下在的hadoop-2.7.7.tar.gz
阿里云国内镜像下载:https://mirrors.aliyun.com/apache/hadoop/common/hadoop-2.7.7/
下载后放到各个节点的/opt/hadoop/hadoop-2.7.7目录下,确保/opt挂载最大的主硬盘上,如果不是请重新挂载
然后配置环境变量
vi /etc/profile
export HADOOP_HOME=/opt/hadoop/hadoop-2.7.7 export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib" export SPARK_HOME=/opt/hadoop/spark-2.3.0-bin-hadoop2.7 export JAVA_HOME=/opt/hadoop/jdk1.8.0_77 export SCALA_HOME=/opt/hadoop/scala-2.12.2 export HIVE_HOME=/opt/hadoop/apache-hive-3.0.0-bin export PATH=$PATH:${SCALA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${SPARK_HOME}/bin:${SCALA_HOME}/bin:${JAVA_HOME}/bin:${HIVE_HOME}/bin
source /etc/profile
2.2 配置hadoop
core-site.xml
<configuration> <property> <name>hadoop.tmp.dir</name> <value>file:/opt/hadoop/data/hadoop/tmp</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <!-- 垃圾回收 --> <property> <name>fs.trash.interval</name> <value>10080</value> </property> <property> <name>fs.trash.checkpoint.interval</name> <value>60</value> </property> </configuration>
hadoop-env.sh
export JAVA_HOME=/opt/hadoop/jdk1.8.0_77
hdfs-site.xml
<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:9001</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/opt/hadoop/data/hadoop/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/opt/hadoop/data/hadoop/datanode</value> </property> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property> <!-- 保留磁盘空间20g--> <property> <name>dfs.datanode.du.reserved</name> <value>21474836480</value> </property> </configuration>
mapred-site.xml
cp mapred-site.xml.template mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
slaves
localhost
可以根据hosts配置其他节点或ip,一行一个
yarn-site.xml
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> </property> <property> <description>Whether to enable log aggregation</description> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8035</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log.server.url</name> <value>http://master:19888/jobhistory/logs</value> </property> <property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> </configuration>
2.3启动hadoop
如果启动有问题:hadoop namenode -format
./sbin/start-all.sh
2.4测试访问hadoop web界面
访问hadoop端口:master:50070/
3.Hbase安装配置
3.1 hbase下载
我选的hbase-1.4.9,hadoop hbase phoenix版本对应官方有个表的,下次找找再分享。
阿里云镜像下载:
https://mirrors.aliyun.com/apache/hbase/
下载后解压到 /opt/hadoop/hbase-1.4.9
3.2 修改配置
3.2.1 环境变量
vi /etc/profile
export HBASE_HOME=/opt/hadoop/hbase-1.4.9 export PATH=$PATH:$HBASE_HOME/bin
source /etc/profile
3.2.2 hbase-env.sh
export JAVA_HOME=/opt/hadoop/jdk1.8.0_77
使用hbase自带的zookeeper
export HBASE_MANAGES_ZK=true
3.2.3 hbase-site.xml
<property> <name>hbase.rootdir</name> <value>hdfs://master:9000/hbase-1.4.9-data</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>master,slave1,slave2,slave3</value> </property>
3.2.4 regionservers
slave1 slave2 slave3
最后把配置好的整个hbase scp拷贝到其他节点
3.3 启动hbase
start-hbase.sh 开启整个集群
查看master进程
# jps 2560 HMaster 2370 HQuorumPeer 2877 HRegionServer
查看slave进程
#jps 7478 HQuorumPeer 8429 HRegionServer
3.3 测试 hbase
打开hbase shell
./hbase shell 2019-06-14 15:22:53,658 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/hadoop/hbase-1.4.9/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] HBase Shell Use "help" to get list of supported commands. Use "exit" to quit this interactive shell. Version 1.4.9, rd625b212e46d01cb17db9ac2e9e927fdb201afa1, Wed Dec 5 11:54:10 PST 2018 hbase(main):001:0>
创建表
hbase(main):001:0> create 'hbase_test', {NAME=>'name'},{NAME=>'age'} 0 row(s) in 2.8730 seconds => Hbase::Table - hbase_test
web界面查看
到此Hbase 搭建成功
4.phoenix 安装配置
4.1 下载apache-phoenix-4.14.1-HBase-1.4-bin
下载地址:https://mirrors.aliyun.com/apache/phoenix/
解压到 master opt/hadoop/apache-phoenix-4.14.1-HBase-1.4-bin
4.2 拷贝 phoenix-4.14.1-HBase-1.4-server.jar
将phoenix 目录下的 phoenix-4.14.1-HBase-1.4-server.jar 拷贝到 所有 节点 hbase下的lib目录
hbase目录conf hbase-site.xml 添加配置
<!-- 数据量限制 --> <property> <name>phoenix.coprocessor.maxServerCacheTimeToLiveMs</name> <value>1800000</value> </property> <property> <name>phoenix.coprocessor.maxMetaDataCacheTimeToLiveMs</name> <value>1800000</value> </property> <property> <name>phoenix.mutate.batchSize</name> <value>5000000</value> </property> <property> <name>phoenix.mutate.maxSize</name> <value>50000000</value> </property> <!-- 超时 --> <property> <name>phoenix.query.timeoutMs</name> <value>1200000</value> </property> <property> <name>phoenix.query.keepAliveMs</name> <value>1200000</value> </property> <property> <name>hbase.rpc.timeout</name> <value>1200000</value> </property> <property> <name>hbase.regionserver.lease.period</name> <value>1200000</value> </property> <property> <name>hbase.client.operation.timeout</name> <value>1200000</value> </property> <property> <name>hbase.client.scanner.timeout.period</name> <value>1200000</value> </property> <!-- 二级索 --> <property> <name>hbase.regionserver.wal.codec</name> <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value> </property>
4.3 把phoenix-4.14.1-HBase-1.4-client.jar 拷贝到所有phoenix的客户端
你下载到master上的phoenix就是客户端,本来就有,那还往哪里拷贝
你的spark要用phoenix,spark的环境就需要考入这个,你的java要用,java也就是客户端
所以我就在我yarn配置的jars,和我spark-shell的extraClassPath 都拷贝了
spark.yarn.jars hdfs://master:9000/sparkJars/*.jar spark.executor.extraClassPath /opt/hadoop/spark-2.3.0-bin-hadoop2.7/external_jars/* spark.driver.extraClassPath /opt/hadoop/spark-2.3.0-bin-hadoop2.7/external_jars/*
4.4 重启 hbase
$./stop-hbase.sh $./start-hbase.sh
4.5 测试phoenx
客户端连接
# vi conf/spark-defaults.conf [root@master spark-2.3.0-bin-hadoop2.7]# /opt/hadoop/apache-phoenix-4.14.1-HBase-1.4-bin/bin/sqlline.py Setting property: [incremental, false] Setting property: [isolation, TRANSACTION_READ_COMMITTED] issuing: !connect jdbc:phoenix: none none org.apache.phoenix.jdbc.PhoenixDriver Connecting to jdbc:phoenix: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/hadoop/apache-phoenix-4.14.1-HBase-1.4-bin/phoenix-4.14.1-HBase-1.4-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 19/06/14 16:14:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Connected to: Phoenix (version 4.14) Driver: PhoenixEmbeddedDriver (version 4.14) Autocommit status: true Transaction isolation: TRANSACTION_READ_COMMITTED Building list of tables and columns for tab-completion (set fastconnect to true to skip)... 578/578 (100%) Done Done sqlline version 1.2.0 0: jdbc:phoenix:>
查看表
hadoop hbase phoenix 大数据环境搭建成功!
感谢这么好的官网:http://phoenix.apache.org/