set names的含义

背景

set names命令会同事设置3个session变量。

SET character_set_client = charset_name;
SET character_set_result = charset_name;
SET character_set_connection = charset_name;

含义

通过官方文档看:

1 character_set_client是指客户端发送过来的语句的编码
2 character_set_connection是指mysqld收到客户端的语句后,要转换到的编码
3 character_set_result是指server执行语句后,返回给客户端的数据的编码

set names设置的3个变量就是设置mysqld和客户端通信时,mysqld应该如何解读client发来的字节,以及返回给客户端什么样的编码。

实验

环境如下:

mysql> show variables like 'character%';
+--------------------------+-------------------------------------+
| Variable_name            | Value                               |
+--------------------------+-------------------------------------+
| character_set_client     | utf8                                |
| character_set_connection | utf8                                |
| character_set_database   | utf8                                |
| character_set_filesystem | binary                              |
| character_set_results    | utf8                                |
| character_set_server     | utf8                                |
| character_set_system     | utf8                                |

实验环境,server端的3个编码设置都是utf8。另外,客户端是标准mysql client,使用的是utf8,和server端编码是一致的。

床架一张测试表:

CREATE TABLE t1(id INT, name VARCHAR(200) CHARSET utf8) engine=InnoDB;

INSERT INTO t1 VALUES(0, '你好');
mysql> SELECT id, name, hex(name) FROM t1;
+------+--------+--------------+
| id   | name   | hex(name)    |
+------+--------+--------------+
|    0 | 你好   | E4BDA0E5A5BD |
+------+--------+--------------+

下面分别改变这3个值,看下结果会发生什么变化:

mysql> SELECT id, name, hex(name) FROM t1;
+------+-----------+--------------------+
| id   | name      | hex(name)          |
+------+-----------+--------------------+
|    0 | 你好      | E4BDA0E5A5BD       |
|    0 | 浣犲ソ    | E6B5A3E78AB2E382BD |
+------+-----------+--------------------+
3 rows in set (0.00 sec)

客户端发送的是UTF8编码的,但是MySQL确认为客户端字符编码是GBK,所以就按照GBK来解析,因此将UTF8原本的6个字节认为是3个字符,然后转成了UTF8存储,虽然UTF8是变长的,但是这里的3个GBK字符按值都是要占3个字节的,转出来一共9个字节,比原来的大。在返回时,是按UTF8返回的,因为存了3个UTF8字符,所以客户端看到的就是3个字符。

SET names utf8;
SET character_set_connection = gbk;
INSERT INTO t1 VALUES(2, '你好');
mysql>  SELECT id, name, hex(name) FROM t1;
+------+-----------+--------------------+
| id   | name      | hex(name)          |
+------+-----------+--------------------+
|    0 | 你好      | E4BDA0E5A5BD       |
|    2 | 你好      | E4BDA0E5A5BD       |
+------+-----------+--------------------+
3 rows in set (0.00 sec)

由于character_set_client是正确的,所以转换的源不会识别错,转换成GBK自然也不会错,后面存储成UTF8时,再从GBK转成UTF8也没有错,因为UTF8和GBK字符集中都包含“你”“好”,所以相互转换也不会出错,只是多了2次转换。

SET names utf8;
SET character_set_results = gbk;
INSERT INTO t1 VALUES(3, '你好');

mysql> select id, name, hex(name) from t1;
+------+--------+--------------------+
| id   | name   | hex(name)          |
+------+--------+--------------------+
|    0 |        | E4BDA0E5A5BD       |
|    1 | 你好   | E6B5A3E78AB2E382BD |
|    2 |        | E4BDA0E5A5BD       |
|    3 |        | E4BDA0E5A5BD       |
+------+--------+--------------------+
4 rows in set (0.00 sec)

再改回原样看下结果:

SET names utf8;
mysql>  SELECT id, name, hex(name) FROM t1;
+------+-----------+--------------------+
| id   | name      | hex(name)          |
+------+-----------+--------------------+
|    0 | 你好      | E4BDA0E5A5BD       |
|    1 | 浣犲ソ    | E6B5A3E78AB2E382BD |
|    2 | 你好      | E4BDA0E5A5BD       |
|    3 | 你好      | E4BDA0E5A5BD       |
+------+-----------+--------------------+
4 rows in set (0.00 sec)

显示错误是因为返回字符集设置的和客户端不匹配。返回时,server将所有字符转成了GBK,结果客户端一根筋的认为是UTF8就解析错了。比较有意思的是第二条记录,即Case1错误插进去的,显示出来的是对的。为什么呢,因为在Case1中存的时候,是按UTF8->强制解析为GBK->然后转成UTF8这个逻辑存储下去的;而返回的时候,因为server会将存的UTF8又给转回GBK,然后客户端又拿着这个GBK误以为是UTF8解析,实际上是Case1的逆过程,虽然两个方向都是错的,最终结果是正确的。

字符集在整个过程中是怎么变化的:

客户端发送请求的时候:

A1 客户端发送出语句(总是以utf8)-->  A2 server收到语句解析(按character_set_client指定编码)
                                      |
                                      V
A4 数据进入mysqld内部存储        <--  A3 server判断是否需要转换编码(以character_set_connection指定编码)

server返回结果时:

B1 server返回结果(按character_set_results指定编码)  -->  B2 客户端解析编码显示(总是以utf8)

A3步是否需要转换编码,代码中的逻辑是这样的,在sql_yacc.yy文件中:

  LEX_STRING tmp;
  THD *thd= YYTHD;
  const CHARSET_INFO *cs_con= thd->variables.collation_connection;
  const CHARSET_INFO *cs_cli= thd->variables.character_set_client;
  uint repertoire= thd->lex->text_string_is_7bit &&
                   my_charset_is_ascii_based(cs_cli) ?
                   MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  if (thd->charset_is_collation_connection ||
      (repertoire == MY_REPERTOIRE_ASCII &&
       my_charset_is_ascii_based(cs_con)))
     tmp= $1;
  else
  {
    if (thd->convert_string(&tmp, cs_con, $1.str, $1.length, cs_cli))
        MYSQL_YYABORT;
  }
  $$= new (thd->mem_root) Item_string(tmp.str, tmp.length, cs_con,
                                      DERIVATION_COERCIBLE,
                                      repertoire);
  if ($$ == NULL)
     MYSQL_YYABORT;

总结

character_set_client和character_set_results是一定要和客户端一致。character_set_connection和character_set_client不一致,有丢失数据的风险,所以尽量也一致,总之这三个值应该一样,且和客户端编码一致,所以才有了set names这个快捷命令。

character_set_*

可以动态修改。指示的是连接进来的默认数据库的字符集。如果没有指示连进来的数据库,则该值的设置与character_set_server的值是一样的。

//没有指示默认数据库,则设置的与character_set_server值一样
mysql -uxxx -pxxxxxx -h127.0.0.1
mysql> show variables like '%character%';
+--------------------------+-------------------------------------+
| Variable_name            | Value                               |
+--------------------------+-------------------------------------+
| character_set_client     | utf8                                |
| character_set_connection | utf8                                |
| character_set_database   | latin1                              |
| character_set_filesystem | binary                              |
| character_set_results    | utf8                                |
| character_set_server     | latin1                              |
| character_set_system     | utf8                                |

//如果连接的时候指示数据库,则设置的与该数据库的字符集一致
mysql -uxxx -pxxxxxx -h127.0.0.1 -D wzp
mysql> show variables like '%character%';
+--------------------------+-------------------------------------+
| Variable_name            | Value                               |
+--------------------------+-------------------------------------+
| character_set_client     | utf8                                |
| character_set_connection | utf8                                |
| character_set_database   | utf8                                |
| character_set_filesystem | binary                              |
| character_set_results    | utf8                                |
| character_set_server     | latin1                              |
| character_set_system     | utf8                                |
mysql> show create database wzp;
+----------+--------------------------------------------------------------+
| Database | Create Database                                              |
+----------+--------------------------------------------------------------+
| wzp      | CREATE DATABASE `wzp` /*!40100 DEFAULT CHARACTER SET utf8 */ |
+----------+--------------------------------------------------------------+

//如果连接之后,进行use db操作,则该值会修改成当前db的字符集
mysql> use test;
mysql> show variables like '%character%';
+--------------------------+-------------------------------------+
| Variable_name            | Value                               |
+--------------------------+-------------------------------------+
| character_set_client     | utf8                                |
| character_set_connection | utf8                                |
| character_set_database   | latin1                              |
| character_set_filesystem | binary                              |
| character_set_results    | utf8                                |
| character_set_server     | latin1                              |
| character_set_system     | utf8                                |
mysql> show create database test;
+----------+-----------------------------------------------------------------+
| Database | Create Database                                                 |
+----------+-----------------------------------------------------------------+
| test     | CREATE DATABASE `test` /*!40100 DEFAULT CHARACTER SET latin1 */ |
+----------+-----------------------------------------------------------------+

把OS上文件名按照该字符集进行解析,默认binary是不做任何转换的,例如LOAD DATA INFILE/SELECT … INTO OUTFILE/LOAD_FILE等。这些文件名会从character_set_client转换成character_set_filesystem之后才进行打开操作。

指示元数据的编码,不可以进行动态修改。

参考