Hive Spark 数据类型总结
一. 基本类型
类型 |
描述 |
示例 |
boolean |
true/false |
TRUE |
tinyint |
1字节的有符号整数,-128~127 |
1 |
smallint |
2个字节的有符号整数,-32768~32767 |
1 |
int |
4个字节的带符号整数,-2,147,483,648 ~ 2,147,483,647 |
1 |
bigint |
8字节带符号整数,-9,223,372,036,854,775,808 ~ 9,223,372,036,854,775,807 |
1 |
float |
4字节单精度浮点数 |
1.0 |
double |
8字节双精度浮点数 |
1.0 |
decimal |
任意精度的带符号小数 |
1.0 |
string |
字符串,变长 |
'go to school' |
varchar |
变长字符串 |
'wang' |
char |
固定长度字符串 |
'F' 'M' |
timestamp |
时间 纳秒精度 |
2021-06-29 20:38:22.044000000 |
date |
日期 |
'2021-06-28' |
CREATE TABLE IF NOT EXISTS test.type_example
(
id_deleted boolean,
order_num tinyint,
scan_num smallint,
click_num int,
show_num bigint,
salary float,
balance decimal(20, 4),
user_id int,
username string,
address varchar(255),
gender char(1),
create_at timestamp,
update_at date
);
INSERT INTO test.type_example(id_deleted,
order_num,
scan_num,
click_num,
show_num,
salary,
balance,
user_id,
username,
address,
gender,
create_at,
update_at)
VALUES (FALSE,
2,
1209,
1390,
2340,
DOUBLE(6500.2),
123456.2345,
202105,
"zhang",
"北京回龙观",
"F",
CURRENT_TIMESTAMP,
CURRENT_DATE);
1.1 decimal
- HIVE:
decimal(m,n)
表示数字总长度为 m 位,小数位为 n 位,那么整数位就只有 m-n 位了,在 hive 中默认值为 decimal(10,0)
- MySQL:
decimal(m,n)
表示整数位长度为 m 位,小数位为 n 位
二. 复杂类型
类型 |
描述 |
示例 |
ARRAY<data_type> |
数组类型 |
["a","b","c","d"] |
MAP<primitive_type, data_type> |
字典类型 |
{"a":1, "b":2} |
STRUCT<col_name : data_type> |
struct类型 |
类似面向对象的实例 |
CREATE TABLE IF NOT EXISTS test.type_example_complex
(
username string,
teacher_list ARRAY<string>,
class_info MAP<string, string>,
score_info STRUCT<math:int, english:int, chinese:int>
);
INSERT INTO test.type_example_complex
(username, teacher_list, class_info, score_info)
VALUES ("zhang",
ARRAY('zhang', "wang", '李'),
MAP('leader', 'xie', 'student_num', '55'),
named_struct('math', 90, 'english', 91, 'chinese', 92));
2.1 array 结构
SELECT username,teacher_list[1] from type_example_complex -- 获取第二个字符
-- zhang,wang
SELECT username,teacher_list[10] from type_example_complex -- 获取超范围的字符
-- zhang,NULL
SELECT array_contains(teacher_list, 'zhang') from type_example_complex -- 是否包含某个元素
-- true
2.2 map 结构
SELECT class_info['student_num'] FROM type_example_complex -- 获取某个key
-- 55
SELECT class_info['not_exists'] FROM type_example_complex --获取不存在的key
-- NULL
-- map_keys(Map)函数:可得map中所有的key; 返回值类型: array
SELECT map_keys(class_info) FROM type_example_complex
-- ["leader","student_num"]
select str_to_map('name:zhang,age:12,address:beijing gu gong') -- 字符串转换成map
-- {"name":"zhang","age":"12","address":"beijing gu gong"}
2.3 struct 结构
struct 相对于map 来说,它有固定的字段及类型。map可以存放任何数量的key-value,而struct只能存放提前定义好的字段,类似其他编程语言的对象的。
SELECT score_info.math,score_info.english,score_info.chinese FROM type_example_complex
-- 90,91,92
SELECT score_info.not_exists FROM type_example_complex
-- 会报错字段不存在
select named_struct('math', 90, 'english', 91, 'chinese', 92) -- 创建一个struct
-- {"math":90,"english":91,"chinese":92}
2.4 复杂类型初始化
Constructor Function |
Operands |
Description |
map |
(key1, value1, key2, value2, ...) |
Creates a map with the given key/value pairs. |
struct |
(val1, val2, val3, ...) |
Struct field names will be col1, col2, .... |
named_struct |
(name1, val1, name2, val2, ...) |
Creates a struct with the given field names and values. |
array |
(val1, val2, ...) |
Creates an array with the given elements. |
参考资料