大象教程
首页
Spark
Hadoop
HDFS
MapReduce
Hive
Hive 教程
Hive 教程
Hive 安装(基于Ubuntu系统)
Hive 架构
Hive 内置函数
Hive UDF 简介
Hive DDL 命令
Hive 视图
Hive 索引
Hive Metastore 的三种配置方式
Hive 数据模型
Hive 数据类型
Hive 操作符
Hive SerDe(序列化与反序列化)
Hive 数据分区
Hive 分桶
Hive 分区与分桶的比较
Hive Join 的原理与机制
Hive map Join
Hive bucket map join
#Hive DDL 命令 DDL 即数据定义语言,比如创建或者删除数据库命令,创建或删除表命令等。下面来详细看一下 Hive DDL 相关命令,并给出示例。 ##库操作 ###创建库 **语法:** ```sql CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name [COMMENT database_comment] //关于数据块的描述 [LOCATION hdfs_path] //指定数据库在HDFS上的存储位置 [WITH DBPROPERTIES (property_name=property_value, ...)]; //指定数据块属性 ``` 默认地址:`/user/hive/warehouse/db_name.db/table_name/partition_name/…` **示例:** 创建普通数据库 ```sql hive> create database t1; OK Time taken: 0.057 seconds hive> show databases; OK t1 default Time taken: 0.018 seconds, Fetched: 21 row(s) hive> ``` 创建库的时候检查存在与否 ```sql hive> create database if not exists tmp; OK Time taken: 0.014 seconds hive> ``` 创建库的时候带注释 ```sql hive> create database if not exists t2 comment 'learning hive'; OK Time taken: 0.039 seconds hive> ``` 利用 desc 查看数据库信息 ```sql hive > desc database t2; OK t2 learning hive hdfs://myhive/user/hive/warehouse/t2.db root USER Time taken: 0.025 seconds, Fetched: 1 row(s) ``` 创建带属性的库 ```sql hive> create database if not exists t3 with dbproperties('creator'='hadoop','date'='2019-06-25'); OK Time taken: 0.047 seconds hive> ``` ###查看库 **语法:** `show databases;` **示例:** 查看有哪些数据库 ```sql hive> show databases; OK t1 t2 t3 default Time taken: 0.015 seconds, Fetched: 4 row(s) hive> ``` 查看数据库的详细属性信息 **语法:** ``` desc database [extended] dbname; ``` **示例:** ```sql hive> desc database extended t3; OK t3 hdfs://myhive/user/hive/warehouse/t3.db root USER {date=2019-06-25, creator=hadoop} Time taken: 0.017 seconds, Fetched: 1 row(s) hive> ``` 查看正在使用哪个库 ```sql hive> select current_database(); OK default Time taken: 0.722 seconds, Fetched: 1 row(s) hive> ``` ###删除库 语法: ```sql drop database dbname; drop database if exists dbname; ``` 默认情况下,hive 不允许删除包含表的数据库,有两种解决办法: 1、 手动删除库下所有表,然后删除库。 2、 使用 cascade 关键字 ```sql drop database if exists dbname cascade; ``` ###切换库 **语法:** ``` use database_name ``` **示例:** ```sql hive> select current_database(); OK default Time taken: 0.092 seconds, Fetched: 1 row(s) hive> use t3; OK Time taken: 0.019 seconds hive> select current_database(); OK t3 Time taken: 0.089 seconds, Fetched: 1 row(s) hive> ``` ##表操作 **语法:** ```sql CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [ROW FORMAT row_format] [STORED AS file_format] [LOCATION hdfs_path] ``` 各个关键词含义如下: **CREATE TABLE** 创建一个指定名字的表。如果相同名字的表已经存在,则抛出异常;用户可以用 IF NOT EXIST 选项来忽略这个异常。 **EXTERNAL** 关键字可以让用户创建一个外部表,在建表的同时指定一个指向实际数据的路径(LOCATION) **LIKE** 允许用户复制现有的表结构,但是不复制数据 **COMMENT** 可以为表与字段增加描述 **PARTITIONED BY** 指定分区 **ROW FORMAT** DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char] MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)] 用户在建表的时候可以自定义 SerDe 或者使用自带的 SerDe。如果没有指定 ROW FORMAT 或者 ROW FORMAT DELIMITED,将会使用自带的 SerDe。在建表的时候,用户还需要为表指定列,用户在指定表的列的同时也会指定自定义的 SerDe,Hive 通过 SerDe 确定表的具体的列的数据。 **STORED AS** SEQUENCEFILE //序列化文件 \| TEXTFILE //普通的文本文件格式 \| RCFILE //行列存储相结合的文件 \| parquet | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname //自定义文件格式 如果文件数据是纯文本,可以使用 STORED AS TEXTFILE。如果数据需要压缩,使用 STORED AS SEQUENCE **LOCATION** 指定表在HDFS的存储路径 **示例:** 创建默认的内部表 ```sql hive> create table student(id int, name string, sex string, age int,department string) row format delimited fields terminated by ","; OK Time taken: 0.218 seconds hive> ``` ###外部表 ```sql hive> create external table student_ext > (id int, name string, sex string, age int,department string) row format delimited fields terminated by "," location "/hive/student"; OK Time taken: 0.161 seconds hive> ``` ###分区表 ```sql hive> create external table student_ptn(id int, name string, sex string, age int,department string) > partitioned by (city string) > row format delimited fields terminated by "," > location "/hive/student_ptn"; OK Time taken: 0.161 seconds hive> ``` ###添加分区 ```sql hive> alter table student_ptn add partition(city="beijing"); OK Time taken: 0.222 seconds hive> alter table student_ptn add partition(city="shenzhen"); OK Time taken: 0.169 seconds ``` ###分桶表 ```sql hive> create external table student_bck(id int, name string, sex string, age int,department string) > clustered by (id) sorted by (id asc, name desc) into 4 buckets > row format delimited fields terminated by "," > location "/hive/student_bck"; OK Time taken: 0.174 seconds hive> ``` ###使用 create table as select 创建表 ```sql hive > load data local inpath "/home/hadoop/student.txt" into table student; Time taken 0.715 seconds hive > select * from student; +-------------+---------------+--------------+--------------+---------------------+ | student.id | student.name | student.sex | student.age | student.department | +-------------+---------------+--------------+--------------+---------------------+ | 95002 | 刘晨 | 女 | 19 | IS | | 95017 | 王风娟 | 女 | 18 | IS | | 95018 | 王一 | 女 | 19 | IS | | 95013 | 冯伟 | 男 | 21 | CS | | 95014 | 王小丽 | 女 | 19 | CS | | 95019 | 邢小丽 | 女 | 19 | IS | | 95020 | 赵钱 | 男 | 21 | IS | | 95003 | 王敏 | 女 | 22 | MA | | 95004 | 张立 | 男 | 19 | IS | | 95012 | 孙花 | 女 | 20 | CS | | 95010 | 孔小涛 | 男 | 19 | CS | | 95005 | 刘刚 | 男 | 18 | MA | | 95006 | 孙庆 | 男 | 23 | CS | | 95007 | 易思玲 | 女 | 19 | MA | | 95008 | 李娜 | 女 | 18 | CS | | 95021 | 周二 | 男 | 17 | MA | | 95022 | 郑明 | 男 | 20 | MA | | 95001 | 李勇 | 男 | 20 | CS | | 95011 | 包小柏 | 男 | 18 | MA | | 95009 | 梦圆圆 | 女 | 18 | MA | | 95015 | 王君 | 男 | 18 | MA | +-------------+---------------+--------------+--------------+---------------------+ 21 rows selected (0.342 seconds) hive > create table student_ctas as select * from student where id < 95012; Time taken 34.514 seconds hive > select * from student_ctas; +------------------+--------------------+-------------------+-------------------+--------------------------+ | student_ctas.id | student_ctas.name | student_ctas.sex | student_ctas.age | student_ctas.department | +------------------+--------------------+-------------------+-------------------+--------------------------+ | 95002 | 刘晨 | 女 | 19 | IS | | 95003 | 王敏 | 女 | 22 | MA | | 95004 | 张立 | 男 | 19 | IS | | 95010 | 孔小涛 | 男 | 19 | CS | | 95005 | 刘刚 | 男 | 18 | MA | | 95006 | 孙庆 | 男 | 23 | CS | | 95007 | 易思玲 | 女 | 19 | MA | | 95008 | 李娜 | 女 | 18 | CS | | 95001 | 李勇 | 男 | 20 | CS | | 95011 | 包小柏 | 男 | 18 | MA | | 95009 | 梦圆圆 | 女 | 18 | MA | +------------------+--------------------+-------------------+-------------------+--------------------------+ 11 rows selected (0.445 seconds) ``` ###查看表 查看当期使用的数据库有哪些表 ```sql hive > show tables; OK ss student student_bck student_ext student_ptn Time taken: 0.019 seconds, Fetched: 5 row(s) hive> ``` 查看非当前使用的数据库中有哪些表 ```sql hive> show tables in default; OK customers Time taken: 0.02 seconds, Fetched: 1 row(s) hive> ``` 查看数据库中以xxx开头的表 ```sql hive> show tables like 'student*'; OK student student_bck student_ext student_ptn Time taken: 0.018 seconds, Fetched: 4 row(s) hive> ``` 查看表信息 ```sql hive> desc student_bck; OK id int name string sex string age int department string Time taken: 0.079 seconds, Fetched: 5 row(s) hive> ``` 查看表详细信息(格式不友好) ```sql hive> desc extended student; OK id int name string sex string age int department string Detailed Table Information Table(tableName:student, dbName:t3, owner:root, createTime:1561444295, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null), FieldSchema(name:sex, type:string, comment:null), FieldSchema(name:age, type:int, comment:null), FieldSchema(name:department, type:string, comment:null)], location:hdfs://cctvdbaapp00:8020/user/hive/warehouse/t3.db/student, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{field.delim=,, serialization.format=,}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{transient_lastDdlTime=1561444295}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) Time taken: 0.076 seconds, Fetched: 7 row(s) ``` 查看表详细信息(格式友好) ```sql hive> desc formatted student; OK # col_name data_type comment id int name string sex string age int department string # Detailed Table Information Database: t3 Owner: root CreateTime: Tue Jun 25 14:31:35 CST 2019 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://myhive/user/hive/warehouse/t3.db/student Table Type: MANAGED_TABLE Table Parameters: transient_lastDdlTime 1561444295 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: field.delim , serialization.format , Time taken: 0.394 seconds, Fetched: 31 row(s) hive> ``` 查看表分区信息 ```sql hive> show partitions student_ptn; OK city=beijing city=shenzhen Time taken: 0.079 seconds, Fetched: 2 row(s) hive> ``` 查看表的详细建表语句 ```sql hive> show create table student_ptn; OK CREATE EXTERNAL TABLE `student_ptn`( `id` int, `name` string, `sex` string, `age` int, `department` string) PARTITIONED BY ( `city` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'field.delim'=',', 'serialization.format'=',') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://cctvdbaapp00:8020/hive/student_ptn' TBLPROPERTIES ( 'transient_lastDdlTime'='1561444415') Time taken: 0.121 seconds, Fetched: 21 row(s) hive> ``` ###修改表 ####修改表名 ```sql hive> alter table student rename to new_student; ``` ####增加字段 ```sql alter table new_student add columns (score int); ``` ####修改一个字段的定义 ```sql alter table new_student change name new_name string; ``` ####删除字段 hive 不支持删除字段 ####替换所有字段 ```sql alter table new_student replace columns (id int, name string, address string); ``` ####添加分区 新增一个分区 ```sql alter table student_ptn add partition(city="chongqing"); ``` 新增多个分区 ```sql alter table student_ptn add partition(city="chongqing2") partition(city="chongqing3") partition(city="chongqing4"); ``` ####添加动态分区 select 的最后一个字段会被作为动态分区字段 ```sql insert overwrite table student_ptn_age partition(age) select id,name,sex,department,age from student_ptn; ``` ####修改分区 一般来说,修改分区一般都是指修改分区的数据存储目录。 在添加分区的时候,直接指定当前分区的数据存储目录 ```sql alter table student_ptn add if not exists partition(city='beijing') location '/student_ptn_beijing' partition(city='cc') location '/student_cc'; ``` 修改已经指定好的分区的数据存储目录 ```sql alter table student_ptn partition (city='beijing') set location '/student_ptn_beijing'; ``` 此时原先的分区文件夹仍存在,但是在往分区添加数据时,只会添加到新的分区目录 ####删除分区 ```sql alter table student_ptn drop partition (city='beijing'); ``` ###删除表 ```sql drop table new_student; ``` ###清空表 ```sql truncate table student_ptn; ``` ##其他辅助命令 ```sql show databases;show databases like 'my*' --查看数据库列表 show tables;show tables in db_name --查看数据表 show craete table table_name --查看数据表的建表语句 show functions;-- 查看 hive 函数列表 show partitions table_name;show partitions table_name partition (city='bj') --查看 hive 表分区 desc table_name;desc extended table_name;desc formatted table_name; --查看表的详细信息(元数据信息) desc database db_name;desc database extended db_name --查看数据库的详细属性信息 truncate table table_name --清空数据表 ```
加我微信交流吧