set names的含义
2017-12-27
背景
set names命令会同事设置3个session变量。
SET character_set_client = charset_name;
SET character_set_result = charset_name;
SET character_set_connection = charset_name;
含义
通过官方文档看:
1 character_set_client是指客户端发送过来的语句的编码
2 character_set_connection是指mysqld收到客户端的语句后,要转换到的编码
3 character_set_result是指server执行语句后,返回给客户端的数据的编码
set names设置的3个变量就是设置mysqld和客户端通信时,mysqld应该如何解读client发来的字节,以及返回给客户端什么样的编码。
实验
环境如下:
mysql> show variables like 'character%';
+--------------------------+-------------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
实验环境,server端的3个编码设置都是utf8。另外,客户端是标准mysql client,使用的是utf8,和server端编码是一致的。
床架一张测试表:
CREATE TABLE t1(id INT, name VARCHAR(200) CHARSET utf8) engine=InnoDB;
INSERT INTO t1 VALUES(0, '你好');
mysql> SELECT id, name, hex(name) FROM t1;
+------+--------+--------------+
| id | name | hex(name) |
+------+--------+--------------+
| 0 | 你好 | E4BDA0E5A5BD |
+------+--------+--------------+
下面分别改变这3个值,看下结果会发生什么变化:
- Case1 只改变character_set_client
mysql> SELECT id, name, hex(name) FROM t1;
+------+-----------+--------------------+
| id | name | hex(name) |
+------+-----------+--------------------+
| 0 | 你好 | E4BDA0E5A5BD |
| 0 | 浣犲ソ | E6B5A3E78AB2E382BD |
+------+-----------+--------------------+
3 rows in set (0.00 sec)
客户端发送的是UTF8编码的,但是MySQL确认为客户端字符编码是GBK,所以就按照GBK来解析,因此将UTF8原本的6个字节认为是3个字符,然后转成了UTF8存储,虽然UTF8是变长的,但是这里的3个GBK字符按值都是要占3个字节的,转出来一共9个字节,比原来的大。在返回时,是按UTF8返回的,因为存了3个UTF8字符,所以客户端看到的就是3个字符。
- Case2 值改变character_set_connection
SET names utf8;
SET character_set_connection = gbk;
INSERT INTO t1 VALUES(2, '你好');
mysql> SELECT id, name, hex(name) FROM t1;
+------+-----------+--------------------+
| id | name | hex(name) |
+------+-----------+--------------------+
| 0 | 你好 | E4BDA0E5A5BD |
| 2 | 你好 | E4BDA0E5A5BD |
+------+-----------+--------------------+
3 rows in set (0.00 sec)
由于character_set_client是正确的,所以转换的源不会识别错,转换成GBK自然也不会错,后面存储成UTF8时,再从GBK转成UTF8也没有错,因为UTF8和GBK字符集中都包含“你”“好”,所以相互转换也不会出错,只是多了2次转换。
- Case 3 只改变character_set_result
SET names utf8;
SET character_set_results = gbk;
INSERT INTO t1 VALUES(3, '你好');
mysql> select id, name, hex(name) from t1;
+------+--------+--------------------+
| id | name | hex(name) |
+------+--------+--------------------+
| 0 | | E4BDA0E5A5BD |
| 1 | 你好 | E6B5A3E78AB2E382BD |
| 2 | | E4BDA0E5A5BD |
| 3 | | E4BDA0E5A5BD |
+------+--------+--------------------+
4 rows in set (0.00 sec)
再改回原样看下结果:
SET names utf8;
mysql> SELECT id, name, hex(name) FROM t1;
+------+-----------+--------------------+
| id | name | hex(name) |
+------+-----------+--------------------+
| 0 | 你好 | E4BDA0E5A5BD |
| 1 | 浣犲ソ | E6B5A3E78AB2E382BD |
| 2 | 你好 | E4BDA0E5A5BD |
| 3 | 你好 | E4BDA0E5A5BD |
+------+-----------+--------------------+
4 rows in set (0.00 sec)
显示错误是因为返回字符集设置的和客户端不匹配。返回时,server将所有字符转成了GBK,结果客户端一根筋的认为是UTF8就解析错了。比较有意思的是第二条记录,即Case1错误插进去的,显示出来的是对的。为什么呢,因为在Case1中存的时候,是按UTF8->强制解析为GBK->然后转成UTF8这个逻辑存储下去的;而返回的时候,因为server会将存的UTF8又给转回GBK,然后客户端又拿着这个GBK误以为是UTF8解析,实际上是Case1的逆过程,虽然两个方向都是错的,最终结果是正确的。
- 分析
字符集在整个过程中是怎么变化的:
客户端发送请求的时候:
A1 客户端发送出语句(总是以utf8)--> A2 server收到语句解析(按character_set_client指定编码)
|
V
A4 数据进入mysqld内部存储 <-- A3 server判断是否需要转换编码(以character_set_connection指定编码)
server返回结果时:
B1 server返回结果(按character_set_results指定编码) --> B2 客户端解析编码显示(总是以utf8)
A3步是否需要转换编码,代码中的逻辑是这样的,在sql_yacc.yy文件中:
LEX_STRING tmp;
THD *thd= YYTHD;
const CHARSET_INFO *cs_con= thd->variables.collation_connection;
const CHARSET_INFO *cs_cli= thd->variables.character_set_client;
uint repertoire= thd->lex->text_string_is_7bit &&
my_charset_is_ascii_based(cs_cli) ?
MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
if (thd->charset_is_collation_connection ||
(repertoire == MY_REPERTOIRE_ASCII &&
my_charset_is_ascii_based(cs_con)))
tmp= $1;
else
{
if (thd->convert_string(&tmp, cs_con, $1.str, $1.length, cs_cli))
MYSQL_YYABORT;
}
$$= new (thd->mem_root) Item_string(tmp.str, tmp.length, cs_con,
DERIVATION_COERCIBLE,
repertoire);
if ($$ == NULL)
MYSQL_YYABORT;
总结
character_set_client和character_set_results是一定要和客户端一致。character_set_connection和character_set_client不一致,有丢失数据的风险,所以尽量也一致,总之这三个值应该一样,且和客户端编码一致,所以才有了set names这个快捷命令。
character_set_*
- character_set_database
可以动态修改。指示的是连接进来的默认数据库的字符集。如果没有指示连进来的数据库,则该值的设置与character_set_server的值是一样的。
//没有指示默认数据库,则设置的与character_set_server值一样
mysql -uxxx -pxxxxxx -h127.0.0.1
mysql> show variables like '%character%';
+--------------------------+-------------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
//如果连接的时候指示数据库,则设置的与该数据库的字符集一致
mysql -uxxx -pxxxxxx -h127.0.0.1 -D wzp
mysql> show variables like '%character%';
+--------------------------+-------------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
mysql> show create database wzp;
+----------+--------------------------------------------------------------+
| Database | Create Database |
+----------+--------------------------------------------------------------+
| wzp | CREATE DATABASE `wzp` /*!40100 DEFAULT CHARACTER SET utf8 */ |
+----------+--------------------------------------------------------------+
//如果连接之后,进行use db操作,则该值会修改成当前db的字符集
mysql> use test;
mysql> show variables like '%character%';
+--------------------------+-------------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
mysql> show create database test;
+----------+-----------------------------------------------------------------+
| Database | Create Database |
+----------+-----------------------------------------------------------------+
| test | CREATE DATABASE `test` /*!40100 DEFAULT CHARACTER SET latin1 */ |
+----------+-----------------------------------------------------------------+
- character_set_filesystem
把OS上文件名按照该字符集进行解析,默认binary是不做任何转换的,例如LOAD DATA INFILE/SELECT … INTO OUTFILE/LOAD_FILE等。这些文件名会从character_set_client转换成character_set_filesystem之后才进行打开操作。
- character_set_system
指示元数据的编码,不可以进行动态修改。