已复制
全屏展示
复制代码

Hive Spark 数据类型总结


· 3 min read

一. 基本类型

类型 描述 示例
boolean true/false TRUE
tinyint 1字节的有符号整数,-128~127 1
smallint 2个字节的有符号整数,-32768~32767 1
int 4个字节的带符号整数,-2,147,483,648 ~ 2,147,483,647 1
bigint 8字节带符号整数,-9,223,372,036,854,775,808 ~ 9,223,372,036,854,775,807 1
float 4字节单精度浮点数 1.0
double 8字节双精度浮点数 1.0
decimal 任意精度的带符号小数 1.0
string 字符串,变长 'go to school'
varchar 变长字符串 'wang'
char 固定长度字符串 'F' 'M'
timestamp 时间 纳秒精度 2021-06-29 20:38:22.044000000
date 日期 '2021-06-28'
  • 示例
CREATE TABLE IF NOT EXISTS test.type_example
(
    id_deleted     boolean,
    order_num      tinyint,
    scan_num       smallint,
    click_num      int,
    show_num       bigint,
    salary         float,
    balance        decimal(20, 4),
    user_id        int,
    username       string,
    address        varchar(255),
    gender         char(1),
    create_at      timestamp,
    update_at      date
);

INSERT INTO test.type_example(id_deleted,
                              order_num,
                              scan_num,
                              click_num,
                              show_num,
                              salary,
                              balance,
                              user_id,
                              username,
                              address,
                              gender,
                              create_at,
                              update_at)
VALUES (FALSE,
        2,
        1209,
        1390,
        2340,
        DOUBLE(6500.2),
        123456.2345,
        202105,
        "zhang",
        "北京回龙观",
        "F",
        CURRENT_TIMESTAMP,
        CURRENT_DATE);

1.1 decimal

  • HIVE: decimal(m,n) 表示数字总长度为 m 位,小数位为 n 位,那么整数位就只有 m-n 位了,在 hive 中默认值为 decimal(10,0)
  • MySQL:decimal(m,n)表示整数位长度为 m 位,小数位为 n 位

二. 复杂类型

类型 描述 示例
ARRAY<data_type> 数组类型 ["a","b","c","d"]
MAP<primitive_type, data_type> 字典类型 {"a":1, "b":2}
STRUCT<col_name : data_type> struct类型 类似面向对象的实例
  • 示例
CREATE TABLE IF NOT EXISTS test.type_example_complex
(
    username     string,
    teacher_list ARRAY<string>,
    class_info   MAP<string, string>,
    score_info   STRUCT<math:int, english:int, chinese:int>
);

INSERT INTO test.type_example_complex
    (username, teacher_list, class_info, score_info)
VALUES ("zhang",
        ARRAY('zhang', "wang", '李'),
        MAP('leader', 'xie', 'student_num', '55'),
        named_struct('math', 90, 'english', 91, 'chinese', 92));
        

2.1 array 结构

SELECT username,teacher_list[1] from type_example_complex  -- 获取第二个字符
-- zhang,wang

SELECT username,teacher_list[10] from type_example_complex -- 获取超范围的字符
-- zhang,NULL

SELECT array_contains(teacher_list, 'zhang') from type_example_complex -- 是否包含某个元素
-- true

2.2 map 结构

SELECT class_info['student_num'] FROM type_example_complex -- 获取某个key
-- 55

SELECT class_info['not_exists'] FROM type_example_complex --获取不存在的key
-- NULL

-- map_keys(Map)函数:可得map中所有的key; 返回值类型: array
SELECT map_keys(class_info) FROM type_example_complex
-- ["leader","student_num"]

select str_to_map('name:zhang,age:12,address:beijing gu gong') -- 字符串转换成map
-- {"name":"zhang","age":"12","address":"beijing gu gong"}

2.3 struct 结构

struct 相对于map 来说,它有固定的字段及类型。map可以存放任何数量的key-value,而struct只能存放提前定义好的字段,类似其他编程语言的对象的。

SELECT score_info.math,score_info.english,score_info.chinese FROM type_example_complex
-- 90,91,92

SELECT score_info.not_exists FROM type_example_complex
-- 会报错字段不存在

select named_struct('math', 90, 'english', 91, 'chinese', 92) -- 创建一个struct
-- {"math":90,"english":91,"chinese":92}

2.4 复杂类型初始化

Constructor Function Operands Description
map (key1, value1, key2, value2, ...) Creates a map with the given key/value pairs.
struct (val1, val2, val3, ...) Struct field names will be col1, col2, ....
named_struct (name1, val1, name2, val2, ...) Creates a struct with the given field names and values.
array (val1, val2, ...) Creates an array with the given elements.

参考资料


文章推荐