hive中文乱码解决方法

hive的列分隔符和行分隔符的使用

hive使用gbk等非utf8字符集-ag真人游戏

阅读 : 558

说明

hive默认是所有文件都是utf8的。hive将按照utf8编码格式对数据文件进行解析和查询。

如果数据文件不是utf8，则需要serde支持指定编码格式。对于常用的lazysimpleserde是支持指定字符集的。

serde is a short name for “serializer and deserializer.”
hive uses serde (and !fileformat) to read and write table rows.
hdfs files –> inputfileformat –> –> deserializer –> row object
row object –> serializer –> –> outputfileformat –> hdfs files

使用

指定serde和字符集。

create external table student8(id string, name string) 
row format serde 'org.apache.hadoop.hive.serde2.lazy.lazysimpleserde' 
with serdeproperties("field.delim"=',',"serialization.encoding"='gbk')
location '/data/student8/';

注意：为指定字符集，必须显式指定serde的类。指定serde类后，则不允许使用"fields terminated by"，而是要显式通过"field.delim"属性指定分隔符。

hive中文乱码解决方法

hive的列分隔符和行分隔符的使用