hadoop hbase phoenix 大数据集群环境安装配置

2020-03-06 16:25:47 | 编辑

在前面已经介绍过一步一步搭建hadoop,再集成spark,再集成hive,也研究过hive on spark,最后是用的模式是使用hdfs,加上yarn进行资源管理spark,在spark中使用hive的模式,请参考:linux hadoop spark环境搭建      hive on spark集群环境搭建    hive on spark环境搭建(官方源码编译方式)  ,这次会在此前搭建好的 1master 4slave子节点节点(64G 45核)基础上继续安装 phoenix on hbase,但依然还是会从hadoop 安装的基础开始讲解。之所以要phoenix,是因为phoenix为hbase提供了sql支持,还有二级索引,这给兄弟们提供了更方便的使用方法!


1.hadoop集群环境准备

1.0 版本选择

参考:hadoop hbase phoenix jdk 版本对应关系

我选择的:jdk1.8.0,hadoop-2.7.7,hbase-1.4.9,apache-phoenix-4.14.1-HBase-1.4-bin


1.1 服务器

    5台linux centos 7 服务器

    cpu:8x6 核    内存:64G    硬盘:500G


1.2 设置节点hostname

    参考:linux修改hostname永久生效

    hostname分别为:master,slave1,slave2,slave3,slave4


1.3 服务器slave和master之间免密设置

    参考:linux服务器之间ssh免密码访问配置


1.4 集群服务器日期时间同步

    参考:linux集群所有节点时间同步


1.5 各节点安装java jdk

    参考:linux centos 安装java jdk及环境变量配置


1.6 所有节点关闭防火墙

    参考:linux centos firewalld和iptables关闭防火墙命令


1.7 配置hosts

配置/etc/host ip对应别名

192.168.1.0 master

192.168.1.0 slave1

.....


2.hadoop 集群环境配置安装

2.1 下载hadoop

https://archive.apache.org/dist/hadoop/common/

我下在的hadoop-2.7.7.tar.gz

阿里云国内镜像下载:https://mirrors.aliyun.com/apache/hadoop/common/hadoop-2.7.7/

下载后放到各个节点的/opt/hadoop/hadoop-2.7.7目录下,确保/opt挂载最大的主硬盘上,如果不是请重新挂载


然后配置环境变量

vi /etc/profile

export HADOOP_HOME=/opt/hadoop/hadoop-2.7.7
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native
export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib"
export SPARK_HOME=/opt/hadoop/spark-2.3.0-bin-hadoop2.7
export JAVA_HOME=/opt/hadoop/jdk1.8.0_77
export SCALA_HOME=/opt/hadoop/scala-2.12.2
export HIVE_HOME=/opt/hadoop/apache-hive-3.0.0-bin
export PATH=$PATH:${SCALA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${SPARK_HOME}/bin:${SCALA_HOME}/bin:${JAVA_HOME}/bin:${HIVE_HOME}/bin

source /etc/profile


2.2 配置hadoop

core-site.xml

<configuration>
    <property>
       <name>hadoop.tmp.dir</name>
       <value>file:/opt/hadoop/data/hadoop/tmp</value>
    </property>
    <property>
       <name>io.file.buffer.size</name>
       <value>131072</value>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:9000</value>
    </property>
    <!-- 垃圾回收 -->
    <property>
        <name>fs.trash.interval</name>
        <value>10080</value>
    </property>
        <property>
        <name>fs.trash.checkpoint.interval</name>
        <value>60</value>
    </property>
</configuration>

hadoop-env.sh

export JAVA_HOME=/opt/hadoop/jdk1.8.0_77

hdfs-site.xml

<configuration>
   <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>master:9001</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/opt/hadoop/data/hadoop/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/opt/hadoop/data/hadoop/datanode</value>
    </property>
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
    <!-- 保留磁盘空间20g-->
    <property>
        <name>dfs.datanode.du.reserved</name>
        <value>21474836480</value>
    </property>
</configuration>

mapred-site.xml 

cp mapred-site.xml.template mapred-site.xml
<configuration>
    <property>
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
    </property>
</configuration>

slaves

localhost

可以根据hosts配置其他节点或ip,一行一个

yarn-site.xml

<configuration>
<!-- Site specific YARN configuration properties -->
    <property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
    </property>
    <property>
       <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
       <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
       <name>yarn.resourcemanager.scheduler.class</name>
       <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
    </property>
    <property>
       <description>Whether to enable log aggregation</description>
       <name>yarn.log-aggregation-enable</name>
       <value>true</value>
    </property>
    <property>
       <name>yarn.resourcemanager.hostname</name>
       <value>master</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>master:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>master:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>master:8035</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>master:8033</value>
    </property>
    <property>
        <name>yarn.nodemanager.pmem-check-enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.log.server.url</name>
        <value>http://master:19888/jobhistory/logs</value>
    </property>
    <property>
        <name>yarn.nodemanager.pmem-check-enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property>
</configuration>


2.3启动hadoop

如果启动有问题:hadoop namenode -format

./sbin/start-all.sh


2.4测试访问hadoop web界面

访问hadoop端口:master:50070/

1.png



3.Hbase安装配置

3.1 hbase下载

我选的hbase-1.4.9,hadoop hbase phoenix版本对应官方有个表的,下次找找再分享。

阿里云镜像下载:

https://mirrors.aliyun.com/apache/hbase/

下载后解压到 /opt/hadoop/hbase-1.4.9


3.2 修改配置

3.2.1 环境变量

vi /etc/profile

export HBASE_HOME=/opt/hadoop/hbase-1.4.9
export PATH=$PATH:$HBASE_HOME/bin

source  /etc/profile


3.2.2 hbase-env.sh

export JAVA_HOME=/opt/hadoop/jdk1.8.0_77

使用hbase自带的zookeeper

export HBASE_MANAGES_ZK=true


3.2.3 hbase-site.xml

            <property>
                <name>hbase.rootdir</name>
                <value>hdfs://master:9000/hbase-1.4.9-data</value>
        </property>
        <property>
                <name>hbase.cluster.distributed</name>
                <value>true</value>
        </property>
        <property>
                <name>hbase.zookeeper.quorum</name>
                <value>master,slave1,slave2,slave3</value>
        </property>


3.2.4 regionservers

slave1
slave2
slave3


最后把配置好的整个hbase scp拷贝到其他节点


3.3 启动hbase

start-hbase.sh 开启整个集群


查看master进程

# jps
2560 HMaster
2370 HQuorumPeer
2877 HRegionServer

查看slave进程

#jps
7478 HQuorumPeer
8429 HRegionServer


3.3 测试 hbase

打开hbase shell

./hbase shell
2019-06-14 15:22:53,658 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoop/hbase-1.4.9/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 1.4.9, rd625b212e46d01cb17db9ac2e9e927fdb201afa1, Wed Dec  5 11:54:10 PST 2018
hbase(main):001:0>


创建表

hbase(main):001:0> create 'hbase_test',  {NAME=>'name'},{NAME=>'age'}
0 row(s) in 2.8730 seconds
=> Hbase::Table - hbase_test


web界面查看

1.jpg

2.jpg


到此Hbase 搭建成功


4.phoenix 安装配置

4.1 下载apache-phoenix-4.14.1-HBase-1.4-bin

下载地址:https://mirrors.aliyun.com/apache/phoenix/

解压到 master opt/hadoop/apache-phoenix-4.14.1-HBase-1.4-bin


4.2 拷贝 phoenix-4.14.1-HBase-1.4-server.jar

将phoenix 目录下的 phoenix-4.14.1-HBase-1.4-server.jar  拷贝到 所有 节点 hbase下的lib目录


hbase目录conf hbase-site.xml 添加配置

        <!-- 数据量限制 -->
<property>
                <name>phoenix.coprocessor.maxServerCacheTimeToLiveMs</name>
                <value>1800000</value>
                </property>
                <property>
                <name>phoenix.coprocessor.maxMetaDataCacheTimeToLiveMs</name>
                <value>1800000</value>
        </property>
        <property>
                <name>phoenix.mutate.batchSize</name>
                <value>5000000</value>
        </property>
        <property>
                <name>phoenix.mutate.maxSize</name>
                <value>50000000</value>
        </property>
<!-- 超时 -->
          <property>
            <name>phoenix.query.timeoutMs</name>
            <value>1200000</value>
          </property>
          <property>
            <name>phoenix.query.keepAliveMs</name>
            <value>1200000</value>
          </property>
          <property>
            <name>hbase.rpc.timeout</name>
            <value>1200000</value>
          </property>
 <property>
            <name>hbase.regionserver.lease.period</name>
            <value>1200000</value>
          </property>
          <property>
            <name>hbase.client.operation.timeout</name>
            <value>1200000</value>
          </property>
          <property>
            <name>hbase.client.scanner.timeout.period</name>
            <value>1200000</value>
          </property>
        <!-- 二级索 -->
        <property>
          <name>hbase.regionserver.wal.codec</name>
          <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
        </property>


4.3 把phoenix-4.14.1-HBase-1.4-client.jar 拷贝到所有phoenix的客户端

你下载到master上的phoenix就是客户端,本来就有,那还往哪里拷贝

你的spark要用phoenix,spark的环境就需要考入这个,你的java要用,java也就是客户端

所以我就在我yarn配置的jars,和我spark-shell的extraClassPath 都拷贝了

spark.yarn.jars                 hdfs://master:9000/sparkJars/*.jar
spark.executor.extraClassPath   /opt/hadoop/spark-2.3.0-bin-hadoop2.7/external_jars/*
spark.driver.extraClassPath     /opt/hadoop/spark-2.3.0-bin-hadoop2.7/external_jars/*


4.4 重启 hbase

$./stop-hbase.sh
$./start-hbase.sh


4.5 测试phoenx

客户端连接

# vi conf/spark-defaults.conf 
[root@master spark-2.3.0-bin-hadoop2.7]# /opt/hadoop/apache-phoenix-4.14.1-HBase-1.4-bin/bin/sqlline.py 
Setting property: [incremental, false]
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix: none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoop/apache-phoenix-4.14.1-HBase-1.4-bin/phoenix-4.14.1-HBase-1.4-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
19/06/14 16:14:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connected to: Phoenix (version 4.14)
Driver: PhoenixEmbeddedDriver (version 4.14)
Autocommit status: true
Transaction isolation: TRANSACTION_READ_COMMITTED
Building list of tables and columns for tab-completion (set fastconnect to true to skip)...
578/578 (100%) Done
Done
sqlline version 1.2.0
0: jdbc:phoenix:>

查看表

1.jpg


hadoop hbase phoenix 大数据环境搭建成功!

感谢这么好的官网:http://phoenix.apache.org/




登录后即可回复 登录 | 注册
    
关注编程学问公众号