1500字范文,内容丰富有趣,写作好帮手!
1500字范文 > 大数据-NoSQL数据库-HBase操作框架:Phoenix【Java写的基于JDBC API的操作HBase数据

大数据-NoSQL数据库-HBase操作框架:Phoenix【Java写的基于JDBC API的操作HBase数据

时间:2018-10-08 16:57:21

相关推荐

大数据-NoSQL数据库-HBase操作框架:Phoenix【Java写的基于JDBC API的操作HBase数据

一、Phoenix概述

1、Phoenix 定义

Phoenix 最早是 saleforce 的一个开源项目,后来成为 Apache 的顶级项目。Phoenix 构建在 HBase 之上的开源 SQL 层. 能够让我们使用标准的 JDBC API 去建表, 插入数据和查询 HBase 中的数据, 从而可以避免使用 HBase 的客户端 API.在我们的应用和 HBase 之间添加了 Phoenix, 并不会降低性能, 而且我们也少写了很多代码.

2、Phoenix 特点

将 SQl 查询编译为 HBase 扫描确定扫描 Rowkey 的最佳开始和结束位置扫描并行执行将 where 子句推送到服务器端的过滤器通过协处理器进行聚合操作完美支持 HBase 二级索引创建DML命令以及通过DDL命令创建和操作表和版本化增量更改。容易集成:如Spark,Hive,Pig,Flume和Map Reduce。

3、Phoenix 架构

4、Phoenix 数据存储

Phoenix 将 HBase 的数据模型映射到关系型世界

二、Phoenix安装

1、下载 Phoenix

/dist/phoenix/apache-phoenix-4.14.2-HBase-1.3/

2、解压 jar 包到任意节点(比如:hadoop102节点)

想要在哪台服务器上使用Phoenix,就在该台服务器安装

[whx@hadoop102 soft]$ tar -zxvf apache-phoenix-4.14.2-HBase-1.3-bin.tar.gz -C ../module/

3、修改目录名称

[whx@hadoop102 module]$ mv apache-phoenix-4.14.2-HBase-1.3-bin/ phoenix[whx@hadoop102 module]$ lltotal 32drwxrwxr-x. 9 whx whx 4096 Jan 31 14:45 flumedrwxr-xr-x. 11 whx whx 4096 Jan 31 10:43 hadoop-2.7.2drwxrwxr-x. 8 whx whx 4096 Feb 2 10:48 hbasedrwxrwxr-x. 9 whx whx 4096 Jan 30 19:27 hivedrwxr-xr-x. 8 whx whx 4096 Dec 13 jdk1.8.0_121drwxr-xr-x. 8 whx whx 4096 Feb 1 16:32 kafkadrwxrwx---. 5 whx whx 4096 May 24 phoenixdrwxr-xr-x. 11 whx whx 4096 Jan 29 22:01 zookeeper-3.4.10[whx@hadoop102 module]$

4、复制Phoenix目录下的 jar 包到HBase目录

复制 HBase 需要用到的 server 和 client 2 个 jar 包到/ojpt/module/hbase/lib目录

[whx@hadoop102 phoenix]$ cp phoenix-4.14.2-HBase-1.3-server.jar /opt/module/hbase/lib[whx@hadoop102 phoenix]$ cp phoenix-4.14.2-HBase-1.3-client.jar /opt/module/hbase/lib[whx@hadoop102 phoenix]$

5、分发Phoenix目录到hadoop101、hadoop103节点

[whx@hadoop102 module]$ xsync.sh phoenix/

6、分发HBase里的Phoenix的 jar 包到hadoop101、hadoop103节点

需要把刚才 copy 的 2个jar 包分发到其他 HBase 节点

[whx@hadoop102 hbase]$ xsync.sh lib/

7、配置hadoop102节点环境变量

[whx@hadoop102 ~]$ vim /etc/profile

JAVA_HOME=/opt/module/jdk1.8.0_121HADOOP_HOME=/opt/module/hadoop-2.7.2ZOOKEEPER_HOME=/opt/module/hadoop-2.7.2HIVE_HOME=/opt/module/hiveFLUME_HOME=/opt/module/flumeHBASE_HOME=/opt/module/hbasePHOENIX_HOME=/opt/module/phoenixPHOENIX_CLASSPATH=$PHOENIX_HOMEPATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$HIVE_HOME/bin:$FLUME_HOME/bin:$HBASE_HOME/bin:$PHOENIX_HOME/binexport PATH JAVA_HOME HADOOP_HOME ZOOKEEPER_HOME HIVE_HOME FLUME_HOME HBASE_HOME PHOENIX_HOME PHOENIX_CLASSPATH

[whx@hadoop102 ~]$ source /etc/profile

三、Phoenix的启动与停止

1、首先启动 hadoop, zookeeper, HBase

[whx@hadoop102 ~]$ start-dfs.sh

[whx@hadoop102 ~]$ xcall.sh /opt/module/zookeeper-3.4.10/bin/zkServer.sh start

[whx@hadoop102 ~]$ /opt/module/hbase/bin/start-hbase.sh

2、启动 Phoenix

[whx@hadoop102 ~]$ sqlline.py hadoop101,hadoop102,hadoop103:2181Setting property: [incremental, false]Setting property: [isolation, TRANSACTION_READ_COMMITTED]issuing: !connect jdbc:phoenix:hadoop101,hadoop102,hadoop103:2181 none none org.apache.phoenix.jdbc.PhoenixDriverConnecting to jdbc:phoenix:hadoop101,hadoop102,hadoop103:2181SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/module/phoenix/phoenix-4.14.2-HBase-1.3-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See /codes.html#multiple_bindings for an explanation.21/02/04 08:52:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableConnected to: Phoenix (version 4.14)Driver: PhoenixEmbeddedDriver (version 4.14)Autocommit status: trueTransaction isolation: TRANSACTION_READ_COMMITTEDBuilding list of tables and columns for tab-completion (set fastconnect to true to skip)...133/133 (100%) DoneDonesqlline version 1.2.00: jdbc:phoenix:hadoop101,hadoop102,hadoop103>

四、Phoenix的使用

1、查看所有表

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> !tables+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+-+| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION | INDEX_STATE | |+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+-+| | SYSTEM | CATALOG| SYSTEM TABLE || | | | | || | SYSTEM | FUNCTION | SYSTEM TABLE || | | | | || | SYSTEM | LOG | SYSTEM TABLE || | | | | || | SYSTEM | SEQUENCE | SYSTEM TABLE || | | | | || | SYSTEM | STATS | SYSTEM TABLE || | | | | |+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+-+0: jdbc:phoenix:hadoop101,hadoop102,hadoop103>

Phoenix运行的时候,需要现在HBase数据库中创建一些Phoenix自己需要用到的一些表,比如:CATALOG 、FUNCTION、LOG、SEQUENCE、STATS其中 TABLE_SCHEM 为库名,TABLE_NAME 为表名从HBase中也能看到Phoenix新建的表

[whx@hadoop102 ~]$ hbase shellSLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/module/hbase/lib/phoenix-4.14.2-HBase-1.3-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/module/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See /codes.html#multiple_bindings for an explanation.HBase Shell; enter 'help<RETURN>' for list of supported commands.Type "exit<RETURN>" to leave the HBase ShellVersion 1.3.1, r930b9a55528fe45d8edce7af42fef2d35e77677a, Thu Apr 6 19:36:54 PDT hbase(main):001:0> listTABLESYSTEM.CATALOG SYSTEM.FUNCTIONSYSTEM.LOGSYSTEM.MUTEX SYSTEM.SEQUENCESYSTEM.STATS 6 row(s) in 0.1220 seconds=> ["SYSTEM.CATALOG", "SYSTEM.FUNCTION", "SYSTEM.LOG", "SYSTEM.MUTEX", "SYSTEM.SEQUENCE", "SYSTEM.STATS"]hbase(main):002:0>

2、创建表

CREATE TABLE IF NOT EXISTS us_population (state CHAR(2) NOT NULL,city VARCHAR NOT NULL,population BIGINTCONSTRAINT whx_pk PRIMARY KEY (state, city)) column_encoded_bytes=0;

char类型必须添加长度限制varchar 可以不用长度限制主键映射到 HBase 中会成为 Rowkey. 如果有多个主键(联合主键), 会把多个主键的值拼成 rowkey在 Phoenix 中, 默认会把表名,字段名等自动转换成大写. 如果要使用消息, 需要把他们用双引号括起来.column_encoded_bytes=0 表示禁止编码

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> CREATE TABLE IF NOT EXISTS us_population (. . . . . . . . . . . . . . . . . . . . . . .> state CHAR(2) NOT NULL,. . . . . . . . . . . . . . . . . . . . . . .> city VARCHAR NOT NULL,. . . . . . . . . . . . . . . . . . . . . . .> population BIGINT. . . . . . . . . . . . . . . . . . . . . . .> CONSTRAINT whx_pk PRIMARY KEY (state, city)) . . . . . . . . . . . . . . . . . . . . . . .> column_encoded_bytes=0;No rows affected (1.244 seconds)0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> !tables+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION | INDEX_STATE |+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+| | SYSTEM | CATALOG | SYSTEM TABLE || | | | || | SYSTEM | FUNCTION | SYSTEM TABLE || | | | || | SYSTEM | LOG | SYSTEM TABLE || | | | || | SYSTEM | SEQUENCE | SYSTEM TABLE || | | | || | SYSTEM | STATS| SYSTEM TABLE || | | | || | | US_POPULATION | TABLE || | | | |+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+0: jdbc:phoenix:hadoop101,hadoop102,hadoop103>

在HBase中查看从Phoenix新建的表 us_population

hbase(main):001:0> describe 'US_POPULATION'Table US_POPULATION is ENABLED US_POPULATION, {TABLE_ATTRIBUTES => {coprocessor$1 => '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', coprocessor$2 => '|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver|805306366|', coprocessor$3 => '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', coprocessor$4 => '|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|', coprocessor$5 => '|org.apache.phoenix.hbase.index.Indexer|805306366|org.apache.hadoop.hbase.index.codec.class=org.apache.phoenix.index.PhoenixIndexCodec,index.builder=org.apache.phoenix.index.PhoenixIndexBuilder'} COLUMN FAMILIES DESCRIPTION{NAME => '0', BLOOMFILTER => 'NONE', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}1 row(s) in 0.1760 secondshbase(main):002:0>

3、删除表

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> drop table us_population;

4、插入/更新记录(upsert,而非insert)

upsert into us_population values('NY','NewYork',8143197);upsert into us_population values('CA','Los Angeles',3844829);upsert into us_population values('IL','Chicago',2842518);

在Phoenix中插入记录:

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> upsert into us_population values('NY','NewYork',8143197);1 row affected (0.027 seconds)0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> upsert into us_population values('CA','Los Angeles',3844829);1 row affected (0.011 seconds)0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> upsert into us_population values('IL','Chicago',2842518);1 row affected (0.006 seconds)0: jdbc:phoenix:hadoop101,hadoop102,hadoop103>

在Phoenix控制台查看

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> select * from us_population;+--------+--------------+-------------+| STATE |CITY| POPULATION |+--------+--------------+-------------+| CA| Los Angeles | 3844829|| IL| Chicago| 2842518|| NY| NewYork| 8143197|+--------+--------------+-------------+3 rows selected (0.035 seconds)0: jdbc:phoenix:hadoop101,hadoop102,hadoop103>

在hbase控制台查看

hbase(main):002:0> scan 'US_POPULATION'ROWCOLUMN+CELL CALos Angeles column=0:POPULATION, timestamp=1612401978719, value=\x80\x00\x00\x00\x00:\xAA\xDD CALos Angeles column=0:_0, timestamp=1612401978719, value=x ILChicago column=0:POPULATION, timestamp=1612401979579, value=\x80\x00\x00\x00\x00+_\x96 ILChicago column=0:_0, timestamp=1612401979579, value=x NYNewYork column=0:POPULATION, timestamp=1612401978697, value=\x80\x00\x00\x00\x00|A] NYNewYork column=0:_0, timestamp=1612401978697, value=x 3 row(s) in 0.0530 secondshbase(main):003:0>

5、删除记录(delete)

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> delete from us_population where state='CA' and city='Los Angeles';1 row affected (0.01 seconds)0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> select * from us_population;+--------+----------+-------------+| STATE | CITY | POPULATION |+--------+----------+-------------+| IL| Chicago | 2842518|| NY| NewYork | 8143197|+--------+----------+-------------+2 rows selected (0.023 seconds)0: jdbc:phoenix:hadoop101,hadoop102,hadoop103>

6、Phoenix 表映射HBase中的表

默认情况下, 直接在 HBase 中创建的表通过 Phoenix 是查不到的。

如果要在 Phoenix 中操作直接在 HBase 中创建的表,则需要在 Phoenix 中进行表的映射。

映射方式有两种: 1. 视图映射 2. 表映射

先在HBase数据库中创建一个测试表:whx_table

hbase(main):019:0> create 'whx_table','cf_user','cf_company'0 row(s) in 1.2170 seconds=> Hbase::Table - whx_tablehbase(main):020:0> desc 'whx_table'Table whx_table is ENABLED whx_table COLUMN FAMILIES DESCRIPTION{NAME => 'cf_company', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} {NAME => 'cf_user', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}2 row(s) in 0.0100 secondshbase(main):021:0>

向whx_table表中插入数据

hbase(main):027:0> put 'whx_table','1001','cf_user:firstname','Nick'0 row(s) in 0.0470 secondshbase(main):029:0> put 'whx_table','1001','cf_user:lastname','Lee'0 row(s) in 0.0150 secondshbase(main):030:0> put 'whx_table','1001','cf_company:name','HUAWEI'0 row(s) in 0.0140 secondshbase(main):031:0> put 'whx_table','1001','cf_company:address','changanjie10hao'0 row(s) in 0.0080 secondshbase(main):033:0> get 'whx_table','1001'COLUMN CELLcf_company:address timestamp=1612408142513, value=changanjie10hao cf_company:name timestamp=1612408141461, value=HUAWEI cf_user:firstnametimestamp=1612408054676, value=Nickcf_user:lastname timestamp=1612408141421, value=Lee1 row(s) in 0.0200 secondshbase(main):034:0>

6.1 视图映射

Phoenix 创建的视图是只读的, 所以只能用来查询, 无法通过视图对数据进行修改等操作。

在Phoenix 中 创建whx_table视图来映射HBase里的whx_table表

create view "whx_table"(empid_pk varchar primary key,"cf_user"."firstname" varchar,"cf_user"."lastname" varchar,"cf_company"."name" varchar,"cf_company"."address" varchar);

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> !tables+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION | INDEX_STATE |+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+| | SYSTEM | CATALOG | SYSTEM TABLE || | | | || | SYSTEM | FUNCTION | SYSTEM TABLE || | | | || | SYSTEM | LOG | SYSTEM TABLE || | | | || | SYSTEM | SEQUENCE | SYSTEM TABLE || | | | || | SYSTEM | STATS| SYSTEM TABLE || | | | || | | US_POPULATION | TABLE || | | | |+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> create view "whx_table"(empid_pk varchar primary key,"cf_user"."firstname" varchar,"cf_user"."lastname" varchar,"cf_company"."name" varchar,"cf_company"."address" varchar) column_encoded_bytes=0;1 row affected (5.913 seconds)0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> !tables+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION | INDEX_STATE |+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+| | SYSTEM | CATALOG | SYSTEM TABLE || | | | || | SYSTEM | FUNCTION | SYSTEM TABLE || | | | || | SYSTEM | LOG | SYSTEM TABLE || | | | || | SYSTEM | SEQUENCE | SYSTEM TABLE || | | | || | SYSTEM | STATS| SYSTEM TABLE || | | | || | | US_POPULATION | TABLE || | | | || | | whx_table| VIEW|| | | | |+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> select * from "whx_table";+-----------+------------+-----------+---------+------------------+| EMPID_PK | firstname | lastname | name |address|+-----------+------------+-----------+---------+------------------+| 1001| Nick | Lee | HUAWEI | changanjie10hao |+-----------+------------+-----------+---------+------------------+1 row selected (0.075 seconds)0: jdbc:phoenix:hadoop101,hadoop102,hadoop103>

视图名称与表名称不能重复;如果还想在Phoenix 中 创建whx_table表来映射HBase里的whx_table表,则需要先将Phoenix 中 创建的whx_table视图删掉。

在Phoenix 中删除whx_table视图并不会影响HBase中的whx_table表

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> drop view "whx_table";No rows affected (0.034 seconds)0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> !tables+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION | INDEX_STATE |+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+| | SYSTEM | CATALOG | SYSTEM TABLE || | | | || | SYSTEM | FUNCTION | SYSTEM TABLE || | | | || | SYSTEM | LOG | SYSTEM TABLE || | | | || | SYSTEM | SEQUENCE | SYSTEM TABLE || | | | || | SYSTEM | STATS| SYSTEM TABLE || | | | || | | US_POPULATION | TABLE || | | | |+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+0: jdbc:phoenix:hadoop101,hadoop102,hadoop103>

6.2 表映射

使用 Phoenix创建对 HBase 的表映射,有两种方法:

当 HBase 中已经存在表时,可以以类似创建视图的方式创建关联表,只需要将create view 改为 create table 即可。 在 HBase 中创建表:

create table "whx_table"(empid_pk varchar primary key,"cf_user"."firstname" varchar,"cf_user"."lastname" varchar,"cf_company"."name" varchar,"cf_company"."address" varchar) column_encoded_bytes=0;

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> !tables+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION | INDEX_STATE |+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+| | SYSTEM | CATALOG | SYSTEM TABLE || | | | || | SYSTEM | FUNCTION | SYSTEM TABLE || | | | || | SYSTEM | LOG | SYSTEM TABLE || | | | || | SYSTEM | SEQUENCE | SYSTEM TABLE || | | | || | SYSTEM | STATS| SYSTEM TABLE || | | | || | | US_POPULATION | TABLE || | | | |+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> create table "whx_table"(empid_pk varchar primary key,"cf_user"."firstname" varchar,"cf_user"."lastname" varchar,"cf_company"."name" varchar,"cf_company"."address" varchar) column_encoded_bytes=0;1 row affected (5.913 seconds)0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> !tables+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION | INDEX_STATE |+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+| | SYSTEM | CATALOG | SYSTEM TABLE || | | | || | SYSTEM | FUNCTION | SYSTEM TABLE || | | | || | SYSTEM | LOG | SYSTEM TABLE || | | | || | SYSTEM | SEQUENCE | SYSTEM TABLE || | | | || | SYSTEM | STATS| SYSTEM TABLE || | | | || | | US_POPULATION | TABLE || | | | || | | whx_table| TABLE || | | | |+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> select * from "whx_table";+-----------+------------+-----------+---------+------------------+| EMPID_PK | firstname | lastname | name |address|+-----------+------------+-----------+---------+------------------+| 1001| Nick | Lee | HUAWEI | changanjie10hao |+-----------+------------+-----------+---------+------------------+1 row selected (0.061 seconds)0: jdbc:phoenix:hadoop101,hadoop102,hadoop103>

说明: 添加column_encoded_bytes=0这个参数之后, 在 HBase 中添加的数据在 Phoenix 中也可以查询到. 否则 HBase 中添加的数据在 Phoenix 中查询不到.当 HBase 中不存在表时,可以直接使用 create table 指令创建需要的表,系统将会自动在 Phoenix 和 HBase 中创建 whx_table 的表,并会根据指令内的参数对表结构进行初始化。

在Phoenix 中删除whx_table表会同时删掉HBase中的whx_table表

在Phoenix 中对whx_table表可以进行增删改查操作

插入操作:表名要用双引号来限定大小写,属性名用单引号

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> upsert into "whx_table" values ('1002','Tom','Lee','LIANXIANG','changanjie11hao');1 row affected (0.03 seconds)0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> select * from "whx_table";+-----------+------------+-----------+------------+------------------+| EMPID_PK | firstname | lastname | name |address|+-----------+------------+-----------+------------+------------------+| 1001| Nick | Lee | HUAWEI| changanjie10hao || 1002| Tom | Lee | LIANXIANG | changanjie11hao |+-----------+------------+-----------+------------+------------------+2 rows selected (0.031 seconds)0: jdbc:phoenix:hadoop101,hadoop102,hadoop103>

删除操作:表名要用双引号来限定大小写

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> delete from "whx_table" where EMPID_PK='1002';1 row affected (0.009 seconds)0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> select * from "whx_table";+-----------+------------+-----------+---------+------------------+| EMPID_PK | firstname | lastname | name |address|+-----------+------------+-----------+---------+------------------+| 1001| Nick | Lee | HUAWEI | changanjie10hao |+-----------+------------+-----------+---------+------------------+1 row selected (0.029 seconds)0: jdbc:phoenix:hadoop101,hadoop102,hadoop103>

7、Phoenix 创建 HBase 二级索引

7.1 配置 HBase 支持 Phoenix 创建二级索引(在hadoop102节点)

7.1.1 修改HBase的配置文件:/opt/module/hbase/conf/hbase-site.xml

<configuration><!-- 每个regionServer的共享目录,用来持久化Hbase,默认情况下在/tmp/hbase下面 --> <property><name>hbase.rootdir</name><value>hdfs://hadoop101:9000/HBase</value> </property><!-- hbase集群模式,false表示hbase的单机,true表示是分布式模式 --> <property> <name>hbase.cluster.distributed</name><value>true</value></property><!-- hbase依赖的外部Zookeeper地址 --> <property> <name>hbase.zookeeper.quorum</name><value>hadoop101:2181,hadoop102:2181,hadoop103:2181</value></property><!--外部Zookeeper各个Linux服务器节点上保存数据的目录--><property> <name>hbase.zookeeper.property.dataDir</name><value>/opt/module/zookeeper-3.4.10/datas</value></property></configuration>

改为:

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><!-- 每个regionServer的共享目录,用来持久化Hbase,默认情况下在/tmp/hbase下面 --> <property><name>hbase.rootdir</name><value>hdfs://hadoop101:9000/HBase</value> </property><!-- hbase集群模式,false表示hbase的单机,true表示是分布式模式 --> <property> <name>hbase.cluster.distributed</name><value>true</value></property><!-- hbase依赖的外部Zookeeper地址:如果要配置HBase支持Phoenix创建二级索引,则不要添加端口号2181 --> <property> <name>hbase.zookeeper.quorum</name><value>hadoop101,hadoop102,hadoop103</value></property><!--外部Zookeeper各个Linux服务器节点上保存数据的目录--><property> <name>hbase.zookeeper.property.dataDir</name><value>/opt/module/zookeeper-3.4.10/datas</value></property><!--配置HBase支持Phoenix创建二级索引:添加如下配置到HBase的Hmaster节点的hbase-site.xml--><property><name>hbase.master.loadbalancer.class</name><value>org.apache.phoenix.hbase.index.balancer.IndexLoadBalancer</value></property><property><name>hbase.coprocessor.master.classes</name><value>org.apache.phoenix.hbase.index.master.IndexMasterObserver</value></property><!--配置HBase支持Phoenix创建二级索引:添加如下配置到HBase的Hregionerver节点的hbase-site.xml--><property><name>hbase.regionserver.wal.codec</name><value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value></property><property><name>hbase.region.server.rpc.scheduler.factory.class</name><value>org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory</value><description>Factory to create the Phoenix RPC Scheduler that uses separate queues for index and metadata updates</description></property><property><name>hbase.rpc.controllerfactory.class</name><value>org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory</value><description>Factory to create the Phoenix RPC Scheduler that uses separate queues for index and metadata updates</description></property></configuration>

7.1.2 在hadoop102节点上分发/opt/module/hbase/conf/hbase-site.xml

[whx@hadoop102 conf]$ xsync.sh hbase-site.xml

7.1.3 重启HBase、Phoenix

7.2 测试索引

7.2.1 没创建索引时:

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> explain select "firstname" from "whx_table" where "firstname"='Nick';+---------------------------------------------------------------------+-----------------+----------------+--------------+| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |+---------------------------------------------------------------------+-----------------+----------------+--------------+| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER whx_table | null | null | null ||SERVER FILTER BY cf_user."firstname" = 'Nick' | null | null | null |+---------------------------------------------------------------------+-----------------+----------------+--------------+2 rows selected (0.032 seconds)0: jdbc:phoenix:hadoop101,hadoop102,hadoop103>

在phoneix中如果出现了FULL SCAN ,代表没有使用上二级索引,出现了全部列扫描

7.2.2 创建索引

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> create index idx_firstname on "whx_table"("cf_user"."firstname");2 rows affected (6.383 seconds)0: jdbc:phoenix:hadoop101,hadoop102,hadoop103>

测试

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> explain select "firstname" from "whx_table" where "firstname"='Nick';+-----------------------------------------------------------------------------------+-----------------+----------------+--------------+| PLAN| EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |+-----------------------------------------------------------------------------------+-----------------+----------------+--------------+| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER IDX_FIRSTNAME ['Nick'] | null | null | null ||SERVER FILTER BY FIRST KEY ONLY | null | null | null |+-----------------------------------------------------------------------------------+-----------------+----------------+--------------+2 rows selected (0.054 seconds)0: jdbc:phoenix:hadoop101,hadoop102,hadoop103>

如果出现 RANGE SCAN OVER IDX_FIRSTNAME,代表使用上了IDX_FIRSTNAME索引,进行了范围查询!

注意:利用索引查询时不能写select * 语句;

7.2.3 删除索引

drop index 索引名 on 表名

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> drop index idx_firstname on "whx_table";

7.2.4 联合索引

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> create index idx_firstname_lastname on "whx_table"("cf_user"."firstname","cf_user"."lastname");2 rows affected (6.26 seconds)0: jdbc:phoenix:hadoop101,hadoop102,hadoop103>

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> select * from "whx_table";+-----------+------------+-----------+------------+------------------+| EMPID_PK | firstname | lastname | name |address|+-----------+------------+-----------+------------+------------------+| 1001| Nick | Lee | HUAWEI| changanjie10hao || 1002| Tom | Lee | LIANXIANG | changanjie11hao |+-----------+------------+-----------+------------+------------------+2 rows selected (0.053 seconds)0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> explain select "firstname" from "whx_table" where "firstname"='Nick' and "lastname"='Lee';+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER IDX_FIRSTNAME_LASTNAME ['Nick','Lee'] | null | null | null ||SERVER FILTER BY FIRST KEY ONLY | null | null | null |+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+2 rows selected (0.049 seconds)0: jdbc:phoenix:hadoop101,hadoop102,hadoop103>

如果出现 RANGE SCAN OVER IDX_FIRSTNAME_LASTNAME,代表使用上了IDX_FIRSTNAME_LASTNAME索引,进行了范围查询!

7.3 全局索引与局部索引区别

创建全局索引的方法:

CREATE INDEX my_index ON my_table (my_col)

创建局部索引的方法(相比全局索引多了一个关键字 local):

CREATE LOCAL INDEX my_index ON my_table (my_index)

7.3.1 Global index

Global index 是一种分布式索引,可以直接利用索引定位服务器和region,速度更快,但是由于分布式的原因,数据一旦出现新增变化,分布式的索引要进行跨服务的同步操作,带来大量的通信消耗。所以在写操作频繁的字段上不适合建立Global index。

- Global(全局)索引在创建后,专门在hbase中,生成一个表,将索引的信息存储在表中!

- 适合多读少写的场景!

- 每次写操作,不仅要更新数据,还需要更新索引!

- 比如:数据表在RegionServer01,索引表在RegionServer02中,每次发送一次put请求,必须先请求RegionServer01,再请求RegionServer02,才能完成更新。网络开销很大,加重RegionServer集群的压力。

7.3.2 Local index

Local index 由于是数据与索引在同一服务器上,所以要查询的数据在哪台服务器的哪个region是无法定位的,只能先找到region然后再利用索引。

local(本地)索引,在创建后,在表中,创建一个列族,在这个列族中保存索引的信息!适合多写少读的场景!索引是以列族的形式在表中存储,索引和数据在一个RegionServer上,此时 频繁写操作时,只需要请求当前的RegionServer。

Phoenix综述(史上最全Phoenix中文文档)

大数据-NoSQL数据库-HBase操作框架:Phoenix【Java写的基于JDBC API的操作HBase数据库的SQL引擎框架;低延迟 事务性 可使用sql语句 提供JDBC接口】

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。