'MySQL' 카테고리의 글 목록

MySQL

MySQL 테이블 단편화 2021.09.02
Recursive CTE(Common Table Expression) 활용 2021.08.31
MySQL 8.0 신기능 CTE(Common Table Expression) 활용 2021.08.26
MySQL Shell Upgrade Checker Utility 2021.08.24

MySQL 테이블 단편화

타마마임팩트_쫀 2021. 9. 2. 15:19

2021. 9. 2. 15:19

일반적으로 데이터베이스 사용자는 DELETE 문을 사용한 테이블 단편화에 대하여 알고 있습니다. 대량 삭제를 수행할 때마다 대부분의 경우 디스크 공간을 회수하기 위해 항상 테이블을 다시 작성합니다. 그러나 DELETE만 테이블 단편화를 일으킬 수 있지 않습니다. INSERT 문으로 테이블 조각화가 어떻게 발생하는지 확인 해 보겠습니다.

MySQL에는 두 가지 종류의 단편화가 있습니다.

테이블 내에서 일부 InnoDB 페이지가 완전히 비어 있는 단편화.
일부 InnoDB 페이지가 완전히 채워지지 않은 조각화(페이지에 여유 공간이 있음).

INSERT를 사용한 테이블 조각화에는 세 가지 사례가 있습니다.

INSERT ROLLBACK
INSERT 실패
페이지 분할을 통한 단편화

테스트 환경

아래와 같이 테스트 환경을 만들었습니다.

DB: test
Tables : frag, ins_frag, frag_page_spl
Table Size: 2G

사례 1: INSERT ROLLBACK

처음에는 " ins_frag " 라는 새 테이블을 만듭니다 . 그런 다음 BEGIN을 사용하여 트랜잭션을 생성하고 아래와 같이 " frag " 테이블에서 " ins_frag " 테이블로 데이터를 복사합니다 .

mysql> create table ins_frag like frag;
Query OK, 0 rows affected (0.01 sec)

mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into ins_frag select * from frag;
Query OK, 47521280 rows affected (3 min 7.45 sec)
Records: 47521280  Duplicates: 0  Warnings: 0

#Linux shell
sakthi-3.2# ls -lrth
total 8261632
-rw-r-----  1 _mysql  _mysql   2.0G Jun 17 02:43 frag.ibd
-rw-r-----  1 _mysql  _mysql   2.0G Jun 17 03:00 ins_frag.ibd

위에서 INSERT가 실행되었지만 INSERT를 커밋/롤백하지 않았습니다. 두 테이블 모두 2GB 의 디스크 공간을 차지했습니다 .

이제 INSERT를 ROLLBACK 하겠습니다.

mysql> select count(*) from ins_frag;
+----------+
| count(*) |
+----------+
| 47521280 |
+----------+
1 row in set (1.87 sec)

mysql> rollback;
Query OK, 0 rows affected (5 min 45.21 sec)

mysql> select count(*) from ins_frag;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)


#Linux shell
sakthi-3.2# ls -lrth
total 8261632
-rw-r-----  1 _mysql  _mysql   2.0G Jun 17 02:43 frag.ibd
-rw-r-----  1 _mysql  _mysql   2.0G Jun 17 03:09 ins_frag.ibd

INSERT를 롤백한 후 " ins_frag " 테이블 은 여전히 동일한 2GB 디스크 공간을 차지합니다 . MySQL 클라이언트를 통해 단편화 공간을 찾아봅니다.

mysql> SELECT
-> table_schema as 'DATABASE',
-> table_name as 'TABLE',
-> CONCAT(ROUND(( data_length + index_length ) / ( 1024 * 1024 * 1024 ), 2), 'G') 'TOTAL',
-> CONCAT(ROUND(data_free / ( 1024 * 1024 * 1024 ), 2), 'G') 'DATAFREE'
-> FROM information_schema.TABLES
-> where table_schema='test' and table_name='ins_frag';
+----------+----------+-------+----------+
| DATABASE | TABLE.   | TOTAL | DATAFREE |
+----------+----------+-------+----------+
| test     | ins_frag | 0.00G | 1.96G    |
+----------+----------+-------+----------+
1 row in set (0.01 sec)

INSERT를 롤백하면 단편화가 됩니다. 디스크 공간을 회수하려면 테이블을 다시 작성해야 합니다.

mysql> alter table ins_frag engine=innodb;
Query OK, 0 rows affected (2.63 sec)
Records: 0  Duplicates: 0  Warnings: 0

#Linux shell

sakthi-3.2# ls -lrth
total 4131040
-rw-r-----  1 _mysql  _mysql   2.0G Jun 17 02:43 frag.ibd
-rw-r-----  1 _mysql  _mysql   112K Jun 17 03:11 ins_frag.ibd

사례 2: INSERT 실패

테스트를 위해 두 개의 MySQL 클라이언트 세션(세션 1 및 세션 2)을 만들었습니다.

세션 1에서 트랜잭션 내에서 동일한 INSERT 문을 실행하고 있습니다. 그러나 이번에는 세션 2에서 INSERT 쿼리를 중단하고 종료했습니다.

세션 1

#Linux shell

sakthi-3.2# ls -lrth
total 4131040
-rw-r-----  1 _mysql  _mysql   2.0G Jun 17 02:43 frag.ibd
-rw-r-----  1 _mysql  _mysql   112K Jun 17 04:02 ins_frag.ibd

#MySQL shell

mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into ins_frag select * from frag;   #is running

세션 2

mysql> pager grep -i insert ; show processlist;
PAGER set to 'grep -i insert'
| 33 | root            | localhost | percona | Query   |    14 | executing              | insert into ins_frag select * from frag |
4 rows in set (0.00 sec)

mysql> kill 33;
Query OK, 0 rows affected (0.00 sec)

INSERT가 중단되어 실패했습니다.

세션 1

mysql> insert into ins_frag select * from frag;
ERROR 2013 (HY000): Lost connection to MySQL server during query

#Linux shell

sakthi-3.2# ls -lrth
total 4591616
-rw-r-----  1 _mysql  _mysql   2.0G Jun 17 02:43 frag.ibd
-rw-r-----  1 _mysql  _mysql   212M Jun 17 04:21 ins_frag.ibd

#MySQL shell

mysql> select count(*) from ins_frag;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.10 sec)

INSERT가 완료되지 않았으며 테이블에 데이터가 없습니다. 그러나 여전히 테이블 .ibd 파일은 2GB 까지 늘어났습니다. MySQL 클라이언트를 통해 단편화 공간을 살펴보자.

mysql> SELECT
-> table_schema as 'DATABASE',
-> table_name as 'TABLE',
-> CONCAT(ROUND(( data_length + index_length ) / ( 1024 * 1024 ), 2), 'M') 'TOTAL',
-> CONCAT(ROUND(data_free / ( 1024 * 1024 ), 2), 'M') 'DATAFREE'
-> FROM information_schema.TABLES
-> where table_schema='test' and table_name='ins_frag';
+----------+----------+---------+----------+
| DATABASE | TABLE    | TOTAL   | DATAFREE |
+----------+----------+---------+----------+
| test     | ins_frag | 0.03M   | 210.56M  |
+----------+----------+---------+----------+
1 row in set (0.01 sec)

테이블에 단편화된 공간이 있고 공간을 회수하기 위해 테이블을 다시 작성해야 함을 보여줍니다.

mysql> alter table ins_frag engine='innodb';
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

#Linux shell

sakthi-3.2# ls -lrth
total 4131040
-rw-r-----  1 _mysql  _mysql   2.0G Jun 17 02:43 frag.ibd
-rw-r-----  1 _mysql  _mysql   112K Jun 17 04:32 ins_frag.ibd

사례 3: 페이지 분할을 통한 단편화

우리는 내부적으로 InnoDB 레코드가 InnoDB 페이지에 저장된다는 것을 알고 있습니다. 각 페이지 크기는 기본적으로 16K이지만 페이지 크기를 변경할 수 있는 옵션이 있습니다.

InnoDB 페이지에 새 레코드 또는 인덱스 항목을 수용할 공간이 충분하지 않은 경우 두 페이지로 분할되어 각각 약 50%가 차게 됩니다. 즉, rollback이나 delete가 없는 insert 전용 워크로드의 경우에도 평균 페이지 사용률이 75%에 불과할 수 있으므로 이러한 종류의 내부 페이지 조각화에서는 25%의 손실이 발생할 수 있습니다.

인덱스가 정렬 방식으로 구축되면 혼잡이 더 심해지며 테이블에 인덱스의 임의 위치로 이동하는 삽입이 많으면 페이지 분할이 발생합니다.

내림차순 인덱스가 있는 테이블을 만들었습니다.

mysql> show create table frag_page_spl\G
*************************** 1. row ***************************
Table: frag_page_spl
Create Table: CREATE TABLE `frag_page_spl` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(16) DEFAULT NULL,
`messages` varchar(600) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_spl` (`messages` DESC)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.07 sec)

INFORMATION_SCHEMA.INNODB_METRICS 테이블에서 페이지 분할 활동을 모니터링할 수 있습니다. 이를 위해서는 InnoDB 모니터를 활성화해야 합니다.

mysql> SET GLOBAL innodb_monitor_enable=all;
Query OK, 0 rows affected (0.09 sec)

그런 다음 6개의 병렬 스레드로 INSERT를 무작위로 진행하는 스크립트를 수행 하였습니다.

mysql> select name,count,type,status,comment from information_schema.innodb_metrics where name like '%index_page_spl%'\G
*************************** 1. row ***************************
name: index_page_splits
count: 52186
type: counter
status: enabled
comment: Number of index page splits
1 row in set (0.05 sec)

mysql> SELECT
-> table_schema as 'DATABASE',
-> table_name as 'TABLE',
-> CONCAT(ROUND(( data_length + index_length ) / ( 1024 * 1024 ), 2), 'M') 'TOTAL',
-> CONCAT(ROUND(data_free / ( 1024 * 1024 ), 2), 'M') 'DATAFREE'
-> FROM information_schema.TABLES
-> where table_schema='test' and table_name='frag_page_spl';
+----------+---------------+----------+----------+
| DATABASE | TABLE.        | TOTAL    | DATAFREE |
+----------+---------------+----------+----------+
| test     | frag_page_spl | 2667.55M | 127.92M  |
+----------+---------------+----------+----------+
1 row in set (0.00 sec)

INNODB_METRICS에서 페이지 분할 카운터가 증가한 것을 볼 수 있습니다. 127.92MB 의 단편화를 생성한 52186 개의 페이지 분할 작업이 발생 했음을 보여줍니다 .

분할 페이지가 생성되면 뒤로 이동할 수 있는 유일한 방법은 생성된 페이지를 병합 임계값 아래로 떨어뜨리는 것입니다. 이 경우 InnoDB는 병합 작업으로 분할 페이지에서 데이터를 이동합니다. MERGE_THRESHOLD는 테이블 및 특정 인덱스에 대해 구성할 수 있습니다.

데이터를 재구성하는 다른 방법은 테이블을 OPTIMIZE하는 것입니다. 테이블 OPTIMIZE는 매우 무겁고 긴 프로세스일 수 있지만 너무 많은 페이지가 단편화 되어 있는 상황에서 복구하는 유일한 방법입니다.

'MySQL' 카테고리의 다른 글

Recursive CTE(Common Table Expression) 활용 (0)	2021.08.31
MySQL 8.0 신기능 CTE(Common Table Expression) 활용 (0)	2021.08.26
MySQL Shell Upgrade Checker Utility (0)	2021.08.24
MySQL 8 및 MySQL 5.7 메모리 소비 (0)	2021.08.24
pt-kill (0)	2021.08.19

Recursive CTE(Common Table Expression) 활용

타마마임팩트_쫀 2021. 8. 31. 14:35

2021. 8. 31. 14:35

이전에 작성한 내용(MySQL 8.0 신기능 CTE 활용) 중에 재귀 쿼리에 대한 내용을 언급했었습니다. SQL은 일반적으로 재귀 쿼리 구조에 좋지 않지만 이제 MySQL에서 재귀 쿼리를 작성할 수 있습니다. MySQL 8.0 이전에는 stored routin을 생성해야만 재귀 쿼리가 가능했습니다.

재귀 CTE 쿼리란 무엇입니까?

재귀 CTE는 자체 이름을 참조하는 하위 쿼리가 있는 CTE입니다. 다음과 같은 경우에 특히 유용합니다.

시리즈 생성
계층적 또는 트리 구조의 데이터 순회

재귀적 CTE의 주요 구성 요소를 살펴보겠습니다. 다음은 이를 생성하는 구문입니다.

WITH RECURSIVE cte AS (
   initial_query    -- "seed" member
   UNION ALL
   recursive_query    -- recusive member that references to the same CTE name
)
SELECT * FROM cte;    -- main query

먼저 RECURSIVE 절은 필수 항목이며 두 가지 필수 구성 요소가 있습니다. 시드 멤버는 첫번째 반복에서 실행 될 초기 쿼리입니다. 재귀 멤버는 동일한 CTE 이름에 대한 참조를 포함하는 쿼리입니다. 두 번째 구성 요소는 메인 쿼리의 모든 항목을 생성합니다.

반복이 행을 생성하지 않으면 프로세스가 중지됩니다. 메모리를 고갈시킬 수 있는 많은 반복을 생성하지 않도록 주의합니다.

재귀 CTE의 경우 재귀 멤버가 재귀를 종료하는 조건을 포함하는 것이 중요합니다. 개발 기술로 실행 시간을 제한하여 강제 종료할 수 있습니다.

cte_max_recursion_depth 시스템 변수는 CTE를 위한 재귀 수준의 수에 제한을 적용합니다. 서버는 이 변수의 값보다 더 많은 수준을 반복하는 모든 CTE의 실행을 종료합니다. 기본값은 1000입니다.
max_execution_time 시스템 변수는 현재 세션 내에서 SELECT 문에 대한 실행 시간 제한을 적용합니다.
MAX_EXECUTION_TIME optimizer hint가 나타나는 SELECT 문에 대한 쿼리 실행 시간 제한을 적용합니다.

시리즈 생성

이제 시리즈를 생성하기 위해 재귀 CTE를 사용하는 몇 가지 간단한 사용법을 살펴보겠습니다.

1단계 시퀀스

먼저 1에서 10까지의 간단한 정수 시리즈를 만듭니다. N+1 값이 이전 N의 함수이기 때문에 이것은 1레벨 시퀀스입니다.

WITH RECURSIVE natural_sequence AS
  ( SELECT 1 AS n       -- seed member: our sequence starts from 1
    UNION ALL
    SELECT n + 1 FROM natural_sequence    -- recursive member: reference to itself
    WHERE n < 10                          -- stop condition
  )
SELECT * FROM natural_sequence;           -- main query
+------+
| n    |
+------+
|    1 |
|    2 |
|    3 |
|    4 |
|    5 |
|    6 |
|    7 |
|    8 |
|    9 |
|   10 |
+------+

# let's see what happen if we miss the stop condition
mysql> WITH RECURSIVE natural_sequence AS ( SELECT 1 AS n   UNION ALL SELECT n + 1 FROM natural_sequence   ) SELECT * FROM natural_sequence;
ERROR 3636 (HY000): Recursive query aborted after 1001 iterations. Try increasing @@cte_max_recursion_depth to a larger value.

또 다른 전형적인 예는 factorial 을 계산하는 것 입니다.

mysql> WITH RECURSIVE factorial(n, fact) AS ( 
          SELECT 0, 1 
          UNION ALL  
          SELECT n + 1, fact * (n+1)  
          FROM factorial 
          WHERE n < 20 ) 
       SELECT * from factorial;
+------+---------------------+
| n    | fact                |
+------+---------------------+
|    0 |                   1 |
|    1 |                   1 |
|    2 |                   2 |
|    3 |                   6 |
|    4 |                  24 |
|    5 |                 120 |
|    6 |                 720 |
|    7 |                5040 |
|    8 |               40320 |
|    9 |              362880 |
|   10 |             3628800 |
|   11 |            39916800 |
|   12 |           479001600 |
|   13 |          6227020800 |
|   14 |         87178291200 |
|   15 |       1307674368000 |
|   16 |      20922789888000 |
|   17 |     355687428096000 |
|   18 |    6402373705728000 |
|   19 |  121645100408832000 |
|   20 | 2432902008176640000 |
+------+---------------------+

2단계 시퀀스

N+2 값이 이전 두 값 N+1 및 N의 함수인 2단계 시퀀스를 만들고 싶습니다.

전형적인 예는 피보나치 수열입니다. 각 숫자는 0과 1부터 시작하여 앞의 두 숫자의 합입니다. 피보나치 수열의 처음 20개 항목을 계산해 보겠습니다.

mysql> WITH RECURSIVE fibonacci (n, fib_n, next_fib_n) AS (   
          SELECT 1, 0, 1   
          UNION ALL   
          SELECT n + 1, next_fib_n, fib_n + next_fib_n     
          FROM fibonacci 
          WHERE n < 20 ) 
       SELECT * FROM fibonacci;
+------+-------+------------+
| n    | fib_n | next_fib_n |
+------+-------+------------+
|    1 |     0 |          1 |
|    2 |     1 |          1 |
|    3 |     1 |          2 |
|    4 |     2 |          3 |
|    5 |     3 |          5 |
|    6 |     5 |          8 |
|    7 |     8 |         13 |
|    8 |    13 |         21 |
|    9 |    21 |         34 |
|   10 |    34 |         55 |
|   11 |    55 |         89 |
|   12 |    89 |        144 |
|   13 |   144 |        233 |
|   14 |   233 |        377 |
|   15 |   377 |        610 |
|   16 |   610 |        987 |
|   17 |   987 |       1597 |
|   18 |  1597 |       2584 |
|   19 |  2584 |       4181 |
|   20 |  4181 |       6765 |
+------+-------+------------+

날짜 순서

다음과 같이 상점의 매출을 포함하는 간단한 테이블이 있다고 가정해 보겠습니다.

CREATE TABLE sales (
id INT AUTO_INCREMENT PRIMARY KEY,
order_date DATE,
product VARCHAR(20),
price DECIMAL(10,2));

# populate the table
INSERT INTO sales(order_date, product, price) 
VALUES('2020-02-01','DVD PLAYER',100.50),('2020-02-01','TV',399.99),('2020-02-02','LAPTOP',1249.00),
('2020-02-04','DISHWASHER',500.00),('2020-02-04','TV',699.00),('2020-02-06','LAPTOP',990.50),('2020-02-06','HAIRDRYER',29.90),
('2020-02-06','GAME CONSOLE',299.00),('2020-02-07','BOOK',9.00),('2020-02-07','REFRIGERATOR',600.00);

# let's run a query to generate the sales report by day
SELECT order_date, SUM(price) AS sales
FROM sales 
GROUP BY order_date;
+------------+---------+
| order_date | sales   |
+------------+---------+
| 2020-02-01 |  500.49 |
| 2020-02-02 | 1249.00 |
| 2020-02-04 | 1199.00 |
| 2020-02-06 | 1319.40 |
| 2020-02-07 |  609.00 |
+------------+---------+

그러나 판매 보고서에 누락된 날짜(2월 3일 및 2월 5일)가 있습니다. 판매가 없는 날짜까지 포함하는 보고서를 생성하고 싶습니다.

WITH RECURSIVE dates(date) AS (
   SELECT '2020-02-01' 
   UNION ALL
   SELECT date + INTERVAL 1 DAY 
   FROM dates
   WHERE date < '2020-02-07' )
SELECT dates.date, COALESCE(SUM(price), 0) sales
FROM dates LEFT JOIN sales ON dates.date = sales.order_date
GROUP BY dates.date;
+------------+---------+
| date       | sales   |
+------------+---------+
| 2020-02-01 |  500.49 |
| 2020-02-02 | 1249.00 |
| 2020-02-03 |    0.00 |
| 2020-02-04 | 1199.00 |
| 2020-02-05 |    0.00 |
| 2020-02-06 | 1319.40 |
| 2020-02-07 |  609.00 |
+------------+---------+

계층적 데이터 순회

이제 재귀적 CTE의 다른 사용 사례를 살펴보겠습니다. 다음 그림의 조직도에 대한 간단한 트리, 가족 계보에 대한 더 복잡한 트리 및 기차 경로에 대한 그래프입니다.

간단한 트리: 조직도

# create the table
CREATE TABLE orgchart(
id INT PRIMARY KEY,
name VARCHAR(20),
role VARCHAR(20),
manager_id INT,
FOREIGN KEY (manager_id) REFERENCES orgchart(id));

# insert the rows
INSERT INTO orgchart VALUES(1,'Matthew','CEO',NULL), 
(2,'Caroline','CFO',1),(3,'Tom','CTO',1),
(4,'Sam','Treasurer',2),(5,'Ann','Controller',2),
(6,'Anthony','Dev Director',3),(7,'Lousie','Sys Admin',3),
(8,'Travis','Senior DBA',3),(9,'John','Developer',6),
(10,'Jennifer','Developer',6),(11,'Maria','Junior DBA',8);

# let's see the table, The CEO has no manager, so the manager_id is set to NULL
SELECT * FROM orgchat;
+----+----------+--------------+------------+
| id | name     | role         | manager_id |
+----+----------+--------------+------------+
|  1 | Matthew  | CEO          |       NULL |
|  2 | Caroline | CFO          |          1 |
|  3 | Tom      | CTO          |          1 |
|  4 | Sam      | Treasurer    |          2 |
|  5 | Ann      | Controller   |          2 |
|  6 | Anthony  | Dev Director |          3 |
|  7 | Lousie   | Sys Admin    |          3 |
|  8 | Travis   | Senior DBA   |          3 |
|  9 | John     | Developer    |          6 |
| 10 | Jennifer | Developer    |          6 |
| 11 | Maria    | Junior DBA   |          8 |
+----+----------+--------------+------------+

재귀적 CTE를 사용하여 이러한 종류의 계층 쿼리를 실행해 보겠습니다.

# find the reporting chain for all the employees
mysql> WITH RECURSIVE reporting_chain(id, name, path) AS ( 
          SELECT id, name, CAST(name AS CHAR(100))  
          FROM orgchart 
          WHERE manager_id IS NULL 
          UNION ALL 
          SELECT oc.id, oc.name, CONCAT(rc.path,' -> ',oc.name) 
          FROM reporting_chain rc JOIN orgchart oc ON rc.id=oc.manager_id) 
       SELECT * FROM reporting_chain;
+------+----------+---------------------------------------+
| id   | name     | path                                  |
+------+----------+---------------------------------------+
|    1 | Matthew  | Matthew                               |
|    2 | Caroline | Matthew -> Caroline                   |
|    3 | Tom      | Matthew -> Tom                        |
|    4 | Sam      | Matthew -> Caroline -> Sam            |
|    5 | Ann      | Matthew -> Caroline -> Ann            |
|    6 | Anthony  | Matthew -> Tom -> Anthony             |
|    7 | Lousie   | Matthew -> Tom -> Lousie              |
|    8 | Travis   | Matthew -> Tom -> Travis              |
|    9 | John     | Matthew -> Tom -> Anthony -> John     |
|   10 | Jennifer | Matthew -> Tom -> Anthony -> Jennifer |
|   11 | Maria    | Matthew -> Tom -> Travis -> Maria     |
+------+----------+---------------------------------------+

CTE의 "seed" 멤버에 대한 CAST 함수의 사용에 유의하십시오. CAST 기능을 사용하지 않는 경우 어떻게 되는지 살펴보겠습니다.

mysql> WITH RECURSIVE reporting_chain(id, name, path) AS ( 
          SELECT id, name, name 
          FROM orgchart 
          WHERE manager_id IS NULL 
          UNION ALL 
          SELECT oc.id, oc.name, CONCAT(rc.path,' -> ',oc.name) 
          FROM reporting_chain rc JOIN orgchart oc ON rc.id=oc.manager_id) 
       SELECT * FROM reporting_chain;
ERROR 1406 (22001): Data too long for column 'path' at row 1

쿼리는 이론상 정확하지만 문제는 path열의 유형이 비재귀적 SELECT에서만 결정되므로 CHAR(7)이라는 것입니다. CTE의 재귀 부분에서는 문자 잘림이 발생하므로 오류가 발생합니다.

트리를 탐색하고 조직도에서 직원의 수준을 계산하는 쿼리를 살펴보겠습니다.

mysql> WITH RECURSIVE reporting_chain(id, name, path, level) AS ( 
          SELECT id, name, CAST(name AS CHAR(100)), 1  
          FROM orgchart 
          WHERE manager_id IS NULL 
          UNION ALL 
          SELECT oc.id, oc.name, CONCAT(rc.path,' -> ',oc.name), rc.level+1 
          FROM reporting_chain rc JOIN orgchart oc ON rc.id=oc.manager_id) 
       SELECT * FROM reporting_chain ORDER BY level;
+------+----------+---------------------------------------+-------+
| id   | name     | path                                  | level |
+------+----------+---------------------------------------+-------+
|    1 | Matthew  | Matthew                               |     1 |
|    2 | Caroline | Matthew -> Caroline                   |     2 |
|    3 | Tom      | Matthew -> Tom                        |     2 |
|    4 | Sam      | Matthew -> Caroline -> Sam            |     3 |
|    5 | Ann      | Matthew -> Caroline -> Ann            |     3 |
|    6 | Anthony  | Matthew -> Tom -> Anthony             |     3 |
|    7 | Lousie   | Matthew -> Tom -> Lousie              |     3 |
|    8 | Travis   | Matthew -> Tom -> Travis              |     3 |
|    9 | John     | Matthew -> Tom -> Anthony -> John     |     4 |
|   10 | Jennifer | Matthew -> Tom -> Anthony -> Jennifer |     4 |
|   11 | Maria    | Matthew -> Tom -> Travis -> Maria     |     4 |
+------+----------+---------------------------------------+-------+

복잡한 트리: 족보

조부모, 부모, 아들이 있는 다음 족보를 나타내는 테이블을 만듭니다.

CREATE TABLE genealogy(
id INT PRIMARY KEY,
name VARCHAR(20),
father_id INT,
mother_id INT,
FOREIGN KEY(father_id) REFERENCES genealogy(id),
FOREIGN KEY(mother_id) REFERENCES genealogy(id));

# populate the table
INSERT INTO genealogy VALUES(1,'Maria',NULL,NULL),
(2,'Tom',NULL,NULL),(3,'Robert',NULL,NULL),
(4,'Claire',NULL,NULL),(5,'John',2,1),
(6,'Jennifer',2,1),(7,'Sam',3,4),
(8,'James',7,6);

SELECT * FROM genealogy;
+----+----------+-----------+-----------+
| id | name     | father_id | mother_id |
+----+----------+-----------+-----------+
|  1 | Maria    |      NULL |      NULL |
|  2 | Tom      |      NULL |      NULL |
|  3 | Robert   |      NULL |      NULL |
|  4 | Claire   |      NULL |      NULL |
|  5 | John     |         2 |         1 |
|  6 | Jennifer |         2 |         1 |
|  7 | Sam      |         3 |         4 |
|  8 | James    |         7 |         6 |
+----+----------+-----------+-----------+

James의 모든 조상과 관계를 찾아 보겠습니다.

mysql> WITH RECURSIVE ancestors AS ( 
          SELECT *, CAST('son' AS CHAR(20)) AS relationship, 0 level 
          FROM genealogy  
          WHERE name='James' 
          UNION ALL 
          SELECT g.*, CASE WHEN g.id=a.father_id AND level=0 THEN 'father' 
                           WHEN g.id=a.mother_id AND level=0 THEN 'mother' 
                           WHEN g.id=a.father_id AND level=1 THEN 'grandfather' 
                           WHEN g.id=a.mother_id AND level=1 THEN 'grandmother' 
                       END,
                       level+1 
           FROM genealogy g, ancestors a 
           WHERE g.id=a.father_id OR g.id=a.mother_id) 
        SELECT * FROM ancestors;
+------+----------+-----------+-----------+--------------+-------+
| id   | name     | father_id | mother_id | relationship | level |
+------+----------+-----------+-----------+--------------+-------+
|    8 | James    |         7 |         6 | son          |     0 |
|    6 | Jennifer |         2 |         1 | mother       |     1 |
|    7 | Sam      |         3 |         4 | father       |     1 |
|    1 | Maria    |      NULL |      NULL | grandmother  |     2 |
|    2 | Tom      |      NULL |      NULL | grandfather  |     2 |
|    3 | Robert   |      NULL |      NULL | grandfather  |     2 |
|    4 | Claire   |      NULL |      NULL | grandmother  |     2 |
+------+----------+-----------+-----------+--------------+-------+

동일한 쿼리를 사용하지만 초기 조건을 변경하면 계층 구조에 있는 모든 사람(예: Jennifer)의 조상을 찾을 수 있습니다.

mysql> WITH RECURSIVE ancestors AS ( 
          SELECT *, CAST('daughter' AS CHAR(20)) AS relationship, 0 level 
          FROM genealogy 
          WHERE name='Jennifer' 
          UNION ALL 
          SELECT g.*, CASE WHEN g.id=a.father_id AND level=0 THEN 'father' 
                           WHEN g.id=a.mother_id AND level=0 THEN 'mother' 
                           WHEN g.id=a.father_id AND level=1 THEN 'grandfather' 
                           WHEN g.id=a.mother_id AND level=1 THEN 'grandmother' 
                      END, 
                      level+1 
           FROM genealogy g, ancestors a 
           WHERE g.id=a.father_id OR g.id=a.mother_id) 
        SELECT * FROM ancestors;
+------+----------+-----------+-----------+--------------+-------+
| id   | name     | father_id | mother_id | relationship | level |
+------+----------+-----------+-----------+--------------+-------+
|    6 | Jennifer |         2 |         1 | daughter     |     0 |
|    1 | Maria    |      NULL |      NULL | mother       |     1 |
|    2 | Tom      |      NULL |      NULL | father       |     1 |
+------+----------+-----------+-----------+--------------+-------+

그래프: 기차 노선

아래 이미지에서 더 중요한 도시에 대한 이탈리아의 기차 경로를 나타내는 그래프를 만들어 보겠습니다.

단방향 및 양방향 연결에 유의하세요. 각 연결에는 km 단위의 거리도 있습니다.

CREATE TABLE train_route(
id INT PRIMARY KEY,
origin VARCHAR(20),
destination VARCHAR(20),
distance INT);

# populate the table
INSERT INTO train_route VALUES(1,'MILAN','TURIN',150),
(2,'TURIN','MILAN',150),(3,'MILAN','VENICE',250),
(4,'VENICE','MILAN',250),(5,'MILAN','GENOA',200),
(6,'MILAN','ROME',600),(7,'ROME','MILAN',600),
(8,'MILAN','FLORENCE',380),(9,'TURIN','GENOA',160),
(10,'GENOA','TURIN',160),(11,'FLORENCE','VENICE',550),
(12,'FLORENCE','ROME',220),(13,'ROME','FLORENCE',220),
(14,'GENOA','ROME',500),(15,'ROME','NAPLES',210),
(16,'NAPLES','VENICE',800);

SELECT * FROM train_route;
+----+----------+-------------+----------+
| id | origin   | destination | distance |
+----+----------+-------------+----------+
|  1 | MILAN    | TURIN       |      150 |
|  2 | TURIN    | MILAN       |      150 |
|  3 | MILAN    | VENICE      |      250 |
|  4 | VENICE   | MILAN       |      250 |
|  5 | MILAN    | GENOA       |      200 |
|  6 | MILAN    | ROME        |      600 |
|  7 | ROME     | MILAN       |      600 |
|  8 | MILAN    | FLORENCE    |      380 |
|  9 | TURIN    | GENOA       |      160 |
| 10 | GENOA    | TURIN       |      160 |
| 11 | FLORENCE | VENICE      |      550 |
| 12 | FLORENCE | ROME        |      220 |
| 13 | ROME     | FLORENCE    |      220 |
| 14 | GENOA    | ROME        |      500 |
| 15 | ROME     | NAPLES      |      210 |
| 16 | NAPLES   | VENICE      |      800 |
+----+----------+-------------+----------+

밀라노를 출발지로 하는 모든 기차 목적지 반환:

mysql> WITH RECURSIVE train_destination AS ( 
          SELECT origin AS dest 
          FROM train_route 
          WHERE origin='MILAN'  
          UNION  
          SELECT tr.destination 
          FROM train_route tr 
          JOIN train_destination td ON td.dest=tr.origin) 
       SELECT * from train_destination;
+----------+
| dest     |
+----------+
| MILAN    |
| TURIN    |
| VENICE   |
| GENOA    |
| ROME     |
| FLORENCE |
| NAPLES   |
+----------+

기본적으로 어느 도시에서나 이탈리아에서는 원하는 곳으로 갈 수 있지만 다른 경로가 있습니다. 따라서 쿼리를 실행하여 밀라노와 나폴리에서 시작하여 가능한 모든 경로와 각 경로의 총 길이를 알아보겠습니다.

mysql> WITH RECURSIVE paths (cur_path, cur_dest, tot_distance) AS (     
          SELECT CAST(origin AS CHAR(100)), CAST(origin AS CHAR(100)), 0 
          FROM train_route 
          WHERE origin='MILAN'   
          UNION     
          SELECT CONCAT(paths.cur_path, ' -> ', train_route.destination), train_route.destination, paths.tot_distance+train_route.distance        
          FROM paths, train_route        
          WHERE paths.cur_dest = train_route.origin 
           AND  NOT FIND_IN_SET(train_route.destination, REPLACE(paths.cur_path,' -> ',',') ) ) 
       SELECT * FROM paths;
+-------------------------------------------------------+----------+--------------+
| cur_path                                              | cur_dest | tot_distance |
+-------------------------------------------------------+----------+--------------+
| MILAN                                                 | MILAN    |            0 |
| MILAN -> TURIN                                        | TURIN    |          150 |
| MILAN -> VENICE                                       | VENICE   |          250 |
| MILAN -> GENOA                                        | GENOA    |          200 |
| MILAN -> ROME                                         | ROME     |          600 |
| MILAN -> FLORENCE                                     | FLORENCE |          380 |
| MILAN -> TURIN -> GENOA                               | GENOA    |          310 |
| MILAN -> GENOA -> TURIN                               | TURIN    |          360 |
| MILAN -> GENOA -> ROME                                | ROME     |          700 |
| MILAN -> ROME -> FLORENCE                             | FLORENCE |          820 |
| MILAN -> ROME -> NAPLES                               | NAPLES   |          810 |
| MILAN -> FLORENCE -> VENICE                           | VENICE   |          930 |
| MILAN -> FLORENCE -> ROME                             | ROME     |          600 |
| MILAN -> TURIN -> GENOA -> ROME                       | ROME     |          810 |
| MILAN -> GENOA -> ROME -> FLORENCE                    | FLORENCE |          920 |
| MILAN -> GENOA -> ROME -> NAPLES                      | NAPLES   |          910 |
| MILAN -> ROME -> FLORENCE -> VENICE                   | VENICE   |         1370 |
| MILAN -> ROME -> NAPLES -> VENICE                     | VENICE   |         1610 |
| MILAN -> FLORENCE -> ROME -> NAPLES                   | NAPLES   |          810 |
| MILAN -> TURIN -> GENOA -> ROME -> FLORENCE           | FLORENCE |         1030 |
| MILAN -> TURIN -> GENOA -> ROME -> NAPLES             | NAPLES   |         1020 |
| MILAN -> GENOA -> ROME -> FLORENCE -> VENICE          | VENICE   |         1470 |
| MILAN -> GENOA -> ROME -> NAPLES -> VENICE            | VENICE   |         1710 |
| MILAN -> FLORENCE -> ROME -> NAPLES -> VENICE         | VENICE   |         1610 |
| MILAN -> TURIN -> GENOA -> ROME -> FLORENCE -> VENICE | VENICE   |         1580 |
| MILAN -> TURIN -> GENOA -> ROME -> NAPLES -> VENICE   | VENICE   |         1820 |
+-------------------------------------------------------+----------+--------------+

mysql> WITH RECURSIVE paths (cur_path, cur_dest, tot_distance) AS (     
          SELECT CAST(origin AS CHAR(100)), CAST(origin AS CHAR(100)), 0 
          FROM train_route 
          WHERE origin='NAPLES'   
          UNION     
          SELECT CONCAT(paths.cur_path, ' -> ', train_route.destination), train_route.destination, paths.tot_distance+train_route.distance        
          FROM paths, train_route        
          WHERE paths.cur_dest = train_route.origin 
            AND NOT FIND_IN_SET(train_route.destination, REPLACE(paths.cur_path,' -> ',',') ) ) 
       SELECT * FROM paths;
+-----------------------------------------------------------------+----------+--------------+
| cur_path                                                        | cur_dest | tot_distance |
+-----------------------------------------------------------------+----------+--------------+
| NAPLES                                                          | NAPLES   |            0 |
| NAPLES -> VENICE                                                | VENICE   |          800 |
| NAPLES -> VENICE -> MILAN                                       | MILAN    |         1050 |
| NAPLES -> VENICE -> MILAN -> TURIN                              | TURIN    |         1200 |
| NAPLES -> VENICE -> MILAN -> GENOA                              | GENOA    |         1250 |
| NAPLES -> VENICE -> MILAN -> ROME                               | ROME     |         1650 |
| NAPLES -> VENICE -> MILAN -> FLORENCE                           | FLORENCE |         1430 |
| NAPLES -> VENICE -> MILAN -> TURIN -> GENOA                     | GENOA    |         1360 |
| NAPLES -> VENICE -> MILAN -> GENOA -> TURIN                     | TURIN    |         1410 |
| NAPLES -> VENICE -> MILAN -> GENOA -> ROME                      | ROME     |         1750 |
| NAPLES -> VENICE -> MILAN -> ROME -> FLORENCE                   | FLORENCE |         1870 |
| NAPLES -> VENICE -> MILAN -> FLORENCE -> ROME                   | ROME     |         1650 |
| NAPLES -> VENICE -> MILAN -> TURIN -> GENOA -> ROME             | ROME     |         1860 |
| NAPLES -> VENICE -> MILAN -> GENOA -> ROME -> FLORENCE          | FLORENCE |         1970 |
| NAPLES -> VENICE -> MILAN -> TURIN -> GENOA -> ROME -> FLORENCE | FLORENCE |         2080 |
+-----------------------------------------------------------------+----------+--------------+

이제 한 출발지에서 최종 목적지까지의 최단 경로를 찾는 것이 매우 쉽습니다. 기본 쿼리를 필터링하고 정렬하기만 하면 됩니다. 여기 몇 가지 예가 있어요.

# shortest path from MILAN to NAPLES
mysql> WITH RECURSIVE paths (cur_path, cur_dest, tot_distance) AS (     
          SELECT CAST(origin AS CHAR(100)), CAST(origin AS CHAR(100)), 0 FROM train_route WHERE origin='MILAN'   
          UNION     
          SELECT CONCAT(paths.cur_path, ' -> ', train_route.destination), train_route.destination, paths.tot_distance+train_route.distance        
          FROM paths, train_route        
          WHERE paths.cur_dest = train_route.origin AND NOT FIND_IN_SET(train_route.destination, REPLACE(paths.cur_path,' -> ',',') ) ) 
       SELECT * FROM paths 
       WHERE cur_dest='NAPLES' 
       ORDER BY tot_distance ASC LIMIT 1
+-------------------------+----------+--------------+
| cur_path                | cur_dest | tot_distance |
+-------------------------+----------+--------------+
| MILAN -> ROME -> NAPLES | NAPLES   |          810 |
+-------------------------+----------+--------------+

# shortest path from VENICE to GENOA
mysql> WITH RECURSIVE paths (cur_path, cur_dest, tot_distance) AS (     
          SELECT CAST(origin AS CHAR(100)), CAST(origin AS CHAR(100)), 0 FROM train_route WHERE origin='VENICE'   
          UNION     
          SELECT CONCAT(paths.cur_path, ' -> ', train_route.destination), train_route.destination, paths.tot_distance+train_route.distance        
          FROM paths, train_route        
          WHERE paths.cur_dest = train_route.origin AND NOT FIND_IN_SET(train_route.destination, REPLACE(paths.cur_path,' -> ',',') ) ) 
       SELECT * FROM paths 
       WHERE cur_dest='GENOA' 
       ORDER BY tot_distance ASC LIMIT 1;
+--------------------------+----------+--------------+
| cur_path                 | cur_dest | tot_distance |
+--------------------------+----------+--------------+
| VENICE -> MILAN -> GENOA | GENOA    |          450 |
+--------------------------+----------+--------------+

# shortest path from VENICE to NAPLES
mysql> WITH RECURSIVE paths (cur_path, cur_dest, tot_distance) AS (     
          SELECT CAST(origin AS CHAR(100)), CAST(origin AS CHAR(100)), 0 FROM train_route WHERE origin='VENICE'   
          UNION     
          SELECT CONCAT(paths.cur_path, ' -> ', train_route.destination), train_route.destination, paths.tot_distance+train_route.distance        
          FROM paths, train_route        
          WHERE paths.cur_dest = train_route.origin AND NOT FIND_IN_SET(train_route.destination, REPLACE(paths.cur_path,' -> ',',') ) ) 
       SELECT * FROM paths 
       WHERE cur_dest='NAPLES' 
       ORDER BY tot_distance ASC LIMIT 1;
+-----------------------------------+----------+--------------+
| cur_path                          | cur_dest | tot_distance |
+-----------------------------------+----------+--------------+
| VENICE -> MILAN -> ROME -> NAPLES | NAPLES   |         1060 |
+-----------------------------------+----------+--------------+

제한 사항

실행 시간과 반복 횟수를 제한하기 위해 이미 본 제한 사항 외에도 알고 있어야 하는 다른 기본 제공 제한 사항이 있습니다.

재귀적 SELECT에는 다음 구문이 포함되어서는 안 됩니다.

SUM() 과 같은 집계 함수
GROUP BY
ORDER BY
DISTINCT
Window functions

이러한 제한은 비재귀적 CTE에는 유효하지 않습니다. 또한 재귀 SELECT 부분은 서브 쿼리가 아닌 FROM 절에서 CTE를 한 번만 참조해야 합니다.

결론

재귀 CTE 표현식은 MySQL 8.0을 사용하여 애플리케이션에 대한 쿼리를 구현하는 흥미로운 기능입니다. 재귀는 과거에 이미 stored routin을 생성하여 가능했지만 지금은 더 간단합니다. 또한 재귀 쿼리를 생성하기 위해 추가 권한이 필요하지 않습니다.

일반적으로 재귀적 CTE는 매우 간단하지만 비재귀적 CTE에 비해 조금 더 복잡합니다. 물론 구문의 문제가 아니라 재귀적으로 생각이 필요해서 발생한 문제일 뿐입니다.

'MySQL' 카테고리의 다른 글

MySQL 테이블 단편화 (0)	2021.09.02
MySQL 8.0 신기능 CTE(Common Table Expression) 활용 (0)	2021.08.26
MySQL Shell Upgrade Checker Utility (0)	2021.08.24
MySQL 8 및 MySQL 5.7 메모리 소비 (0)	2021.08.24
pt-kill (0)	2021.08.19

MySQL 8.0 신기능 CTE(Common Table Expression) 활용

타마마임팩트_쫀 2021. 8. 26. 14:46

2021. 8. 26. 14:46

CTE란?

CTE(Common Table Expression)는 서브 쿼리로 쓰이는 파생테이블(derived table)과 비슷한 개념으로 사용됩니다.
CTE는 ANSI SQL 99의 일부이며 MySQL 8.0.1에서 도입되었습니다.
CTE와 비교 대상으로는 VIEW가 있습니다. VIEW는 만들기 위해 권한이 필요하고 사전에 정의를 해야합니다. 반면, CTE는 권한이 필요 없고 하나의 쿼리문이 끝날때까지만 지속되는 일회성 테이블입니다.
CTE는 주로 복잡한 쿼리문에서 코드의 가독성과 재사용성을 위해 파생테이블 대신 사용하기에 유용합니다.

CTE를 사용하는 주요 이유는 다음과 같습니다.

쿼리의 가독성 향상
동일한 쿼리에서 여러 번 참조 가능
향상된 성능
사용자가 VIEW를 생성할 수 없는 경우 VIEW의 유효한 대안
재귀 쿼리 생성 가능: 계층적 데이터를 처리할 때 매우 유용할 수 있습니다.

SELECT , UPDATE 및 DELETE 문은 CTE를 참조할 수 있습니다.

CTE 생성 및 사용 방법

CTE는 하위 쿼리를 사용할 때 파생 테이블과 비슷하지만 기본 쿼리보다 먼저 선언합니다. 'WITH'라는 새로운 전용 절이 필요합니다.

유럽에서 가장 인구가 많은 국가를 찾기 위해 파생 테이블이 있는 서브 쿼리를 표현 합니다.

mysql> SELECT Name, Population 
    -> FROM (SELECT * FROM country WHERE continent='Europe') AS derived_t 
    -> ORDER BY Population DESC LIMIT 5;
+--------------------+------------+
| Name               | Population |
+--------------------+------------+
| Russian Federation |  146934000 |
| Germany            |   82164700 |
| United Kingdom     |   59623400 |
| France             |   59225700 |
| Italy              |   57680000 |
+--------------------+------------+

CTE를 사용하여 다시 작성해 보겠습니다.

mysql> WITH cte AS (SELECT * FROM country WHERE continent='Europe')
    -> SELECT Name, Population 
    -> FROM cte 
    -> ORDER BY Population DESC LIMIT 5;
+--------------------+------------+
| Name               | Population |
+--------------------+------------+
| Russian Federation |  146934000 |
| Germany            |   82164700 |
| United Kingdom     |   59623400 |
| France             |   59225700 |
| Italy              |   57680000 |
+--------------------+------------+

구문은 매우 간단합니다. 쿼리 전에 WITH를 사용하여 CTE 또는 여러 개의 CTE를 정의할 수 있습니다. 그런 다음 쿼리에서 필요한 만큼 모든 CTE를 참조할 수 있습니다. CTE는 기본 쿼리 범위에 대한 임시 테이블로 사용할 수 있도록 미리 구체화된 쿼리로 생각할 수 있습니다.

CTE 이름 뒤에 괄호로 묶인 이름 목록이 오는 경우 열 이름을 지정할 수도 있습니다.

mysql> WITH cte(eur_name, eur_population) AS (SELECT Name, Population FROM country WHERE continent='Europe')
    -> SELECT eur_name, eur_population
    -> FROM cte
    -> ORDER BY eur_opulation DESC LIMIT 5;
+--------------------+----------------+
| eur_name           | eur_population |
+--------------------+----------------+
| Russian Federation |      146934000 |
| Germany            |       82164700 |
| United Kingdom     |       59623400 |
| France             |       59225700 |
| Italy              |       57680000 |
+--------------------+----------------+

CTE는 다음 예와 같이 다른 테이블을 업데이트하기 위한 데이터 원본으로 사용할 수도 있습니다.

# 새 테이블 생성 
mysql> CREATE TABLE country_2020 ( Code char(3), Name char(52), Population_2020 int, PRIMARY KEY(Code) );
Query OK, 0 rows affected (0.10 sec)

# 원본 데이터 복사
mysql> INSERT INTO country_2020 SELECT Code, Name, Population FROM country;
Query OK, 239 rows affected (0.01 sec)
Records: 239  Duplicates: 0  Warnings: 0

# 유럽 인구를 10% 증가
mysql> WITH cte(eur_code, eur_population) AS (SELECT Code, Population FROM country WHERE continent='Europe')  
    -> UPDATE country_2020, cte 
    -> SET Population_2020 = ROUND(eur_population*1.1) 
    -> WHERE Code=cte.eur_code;
Query OK, 46 rows affected (0.01 sec)
Rows matched: 46  Changed: 46  Warnings: 0

# 유럽이 아닌 데이터를 삭제
mysql> WITH cte AS (SELECT Code FROM country WHERE continent <> 'Europe') 
    -> DELETE country_2020 
    -> FROM country_2020, cte 
    -> WHERE country_2020.Code=cte.Code;
Query OK, 193 rows affected (0.02 sec)

mysql> SELECT * FROM country_2020 ORDER BY Population_2020 DESC LIMIT 5;
+------+--------------------+-----------------+
| Code | Name               | Population_2020 |
+------+--------------------+-----------------+
| RUS  | Russian Federation |       161627400 |
| DEU  | Germany            |        90381170 |
| GBR  | United Kingdom     |        65585740 |
| FRA  | France             |        65148270 |
| ITA  | Italy              |        63448000 |
+------+--------------------+-----------------+

CTE는 다음과 같은 INSERT … SELECT 쿼리 에도 사용할 수 있습니다 .

mysql> CREATE TABLE largest_countries (Code char(3), Name char(52), SurfaceArea decimal(10,2), PRIMARY KEY(Code) );
Query OK, 0 rows affected (0.08 sec)

mysql> INSERT INTO largest_countries
    -> WITH cte AS (SELECT Code, Name, SurfaceArea FROM country ORDER BY SurfaceArea DESC LIMIT 10)
    -> SELECT * FROM cte;
Query OK, 10 rows affected (0.02 sec)
Records: 10  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM largest_countries;
+------+--------------------+-------------+
| Code | Name               | SurfaceArea |
+------+--------------------+-------------+
| ARG  | Argentina          |  2780400.00 |
| ATA  | Antarctica         | 13120000.00 |
| AUS  | Australia          |  7741220.00 |
| BRA  | Brazil             |  8547403.00 |
| CAN  | Canada             |  9970610.00 |
| CHN  | China              |  9572900.00 |
| IND  | India              |  3287263.00 |
| KAZ  | Kazakstan          |  2724900.00 |
| RUS  | Russian Federation | 17075400.00 |
| USA  | United States      |  9363520.00 |
+------+--------------------+-------------+
10 rows in set (0.00 sec)

기본 쿼리 전에 미리 계산되거나 구체화된 임시 테이블과 같은 CTE를 만들 수 있습니다. 그리고 쿼리에서 필요한 만큼 해당 임시 테이블을 참조할 수 있습니다.

여러 개의 CTE를 만들 수 있으며 모두 기본 쿼리에서 사용할 수 있습니다. 구문은 다음과 같습니다.

WITH cte1 AS (SELECT ... FROM ... WHERE ...),
     cte2 AS (SELECT ... FROM ... WHERE ...)
SELECT ...
FROM table1, table1, cte1, cte2 ....
WHERE .....

CTE의 범위

CTE는 서브쿼리에서도 사용할 수 있지만 이 경우 범위를 인지해야 한다. CTE는 단일 문의 범위에 대해 존재합니다.

다음과 같은 유효한 쿼리를 고려하십시오.

WITH cte AS (SELECT Code FROM country WHERE Population<1000000)
SELECT * FROM city WHERE city.CountryCode IN 
(SELECT Code FROM cte);   # Scope: "cte" is visible to top SELECT

SELECT * FROM city WHERE city.CountryCode IN 
(WITH cte AS (SELECT Code FROM country WHERE Population<1000000)
SELECT Code from cte);   # Scope: "cte" is not visible to top SELECT

범위에 문제가 발생하지 않도록 하려면 CTE를 사용하는 가장 좋은 방법은 최상위 쿼리의 시작 부분에서 모든 CTE를 만드는 것입니다. 이러한 방식으로 모든 CTE는 필요한 곳 어디에서나 여러 번 사용할 수 있습니다.

여러 CTE 연결

쿼리에 대해 여러 CTE를 생성할 때 또 다른 흥미로운 기능은 chaining 입니다. 체인은 이전 CTE에 대한 하나 이상의 참조를 포함하는 모든 CTE를 정의할 수 있습니다.

다음 예는 체인을 사용하는 방법을 보여줍니다. 우리는 세계에서 가장 높은 인구 밀도와 가장 낮은 인구 밀도를 가진 국가를 찾고 싶습니다. 3개의 CTE로 구성된 체인을 만듭니다. 마지막 두 개는 첫 번째 것에 대한 참조를 포함합니다.

mysql> WITH density_by_country(country,density) AS 
       (SELECT Name, Population/SurfaceArea 
       FROM country 
       WHERE Population>0 and surfacearea>0), 
       max_density(country,maxdensity,label) AS 
       (SELECT country, density, 'max density' 
       FROM density_by_country 
       WHERE density=(SELECT MAX(density) FROM density_by_country)), 
       min_density(country,mindensity,label) AS 
       (SELECT country, density, 'min density' 
       FROM density_by_country 
       WHERE density=(SELECT MIN(density) FROM density_by_country)) 
       SELECT * FROM max_density UNION ALL SELECT * FROM min_density;
+-----------+------------+-------------+
| country   | maxdensity | label       |
+-----------+------------+-------------+
| Macao     | 26277.7778 | max density |
| Greenland | 0.0259     | min density |
+-----------+------------+-------------+

이제 파생 테이블을 대신 사용하여 동일한 쿼리를 다시 작성할 수 있는 방법을 생각해 보았습니다. 'density_by_country' 의 정의를 (여러 번) 복사해야 합니다 . 최종 쿼리는 실제로 매우 커서 읽기가 어려울 수 있습니다.

VIEW 대신 CTE 사용

데이터베이스 사용자에게 VIEW를 생성할 권한이 없을 수 있습니다. CTE는 일반 SELECT와 마찬가지로 테이블과 열을 읽는 기능을 제외하고 특정 권한을 요구하지 않습니다.

그런 다음 VIEW 대신 CTE를 사용할 수 있습니다. 보조금 옵션을 제외하고 CTE는 일반적으로 나중에 보여주겠지만 VIEW보다 성능이 더 좋습니다.

뷰를 만들고 이를 사용하여 쿼리를 실행해 보겠습니다.

mysql> CREATE VIEW city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city GROUP BY countrycode);

mysql> SELECT name, city_pop_by_country.sum_population/country.population ratio 
       FROM country, city_pop_by_country 
       WHERE country.code=city_pop_by_country.countrycode 
         AND country.population > (SELECT 10*AVG(sum_population) FROM city_pop_by_country);
+--------------------+--------+
| name               | ratio  |
+--------------------+--------+
| Bangladesh         | 0.0664 |
| Brazil             | 0.5048 |
| China              | 0.1377 |
| Germany            | 0.3194 |
| Egypt              | 0.2933 |
| Ethiopia           | 0.0510 |
| Indonesia          | 0.1767 |
| India              | 0.1216 |
| Iran               | 0.3845 |
| Japan              | 0.6153 |
| Mexico             | 0.6043 |
| Nigeria            | 0.1557 |
| Pakistan           | 0.2016 |
| Philippines        | 0.4072 |
| Russian Federation | 0.4706 |
| Turkey             | 0.4254 |
| United States      | 0.2825 |
| Vietnam            | 0.1173 |
+--------------------+--------+

# VIEW를 사용한 쿼리의 실행 계획 확인
mysql> EXPLAIN SELECT name, city_pop_by_country.sum_population/country.population ratio FROM country, city_pop_by_country WHERE country.code=city_pop_by_country.countrycode AND country.population > (SELECT 10*AVG(sum_population) FROM city_pop_by_country);
+----+-------------+------------+------------+-------+---------------+-------------+---------+--------------------+------+----------+-------------+
| id | select_type | table      | partitions | type  | possible_keys | key         | key_len | ref                | rows | filtered | Extra       |
+----+-------------+------------+------------+-------+---------------+-------------+---------+--------------------+------+----------+-------------+
|  1 | PRIMARY     | country    | NULL       | ALL   | PRIMARY       | NULL        | NULL    | NULL               |  239 |    33.33 | Using where |
|  1 | PRIMARY     | <derived3> | NULL       | ref   | <auto_key0>   | <auto_key0> | 12      | world.country.Code |   16 |   100.00 | NULL        |
|  3 | DERIVED     | city       | NULL       | index | CountryCode   | CountryCode | 12      | NULL               | 4046 |   100.00 | NULL        |
|  2 | SUBQUERY    | <derived4> | NULL       | ALL   | NULL          | NULL        | NULL    | NULL               | 4046 |   100.00 | NULL        |
|  4 | DERIVED     | city       | NULL       | index | CountryCode   | CountryCode | 12      | NULL               | 4046 |   100.00 | NULL        |
+----+-------------+------------+------------+-------+---------------+-------------+---------+--------------------+------+----------+-------------+

이제 CTE를 사용하여 동일한 쿼리를 다시 작성해 보겠습니다.

mysql> WITH city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city GROUP BY countrycode)
       SELECT name, city_pop_by_country.sum_population/country.population ratio 
       FROM country, city_pop_by_country 
       WHERE country.code=city_pop_by_country.countrycode 
         AND country.population > (SELECT 10*AVG(sum_population) FROM city_pop_by_country);
+--------------------+--------+
| name               | ratio  |
+--------------------+--------+
| Bangladesh         | 0.0664 |
| Brazil             | 0.5048 |
| China              | 0.1377 |
| Germany            | 0.3194 |
| Egypt              | 0.2933 |
| Ethiopia           | 0.0510 |
| Indonesia          | 0.1767 |
| India              | 0.1216 |
| Iran               | 0.3845 |
| Japan              | 0.6153 |
| Mexico             | 0.6043 |
| Nigeria            | 0.1557 |
| Pakistan           | 0.2016 |
| Philippines        | 0.4072 |
| Russian Federation | 0.4706 |
| Turkey             | 0.4254 |
| United States      | 0.2825 |
| Vietnam            | 0.1173 |
+--------------------+--------+

# CTE 사용한 쿼리의 실행 계획 확인
mysql> EXPLAIN WITH city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city GROUP BY countrycode)
SELECT name, city_pop_by_country.sum_population/country.population ratio FROM country, city_pop_by_country WHERE country.code=city_pop_by_country.countrycode AND country.population > (SELECT 10*AVG(sum_population) FROM city_pop_by_country);
+----+-------------+------------+------------+-------+---------------+-------------+---------+--------------------+------+----------+-------------+
| id | select_type | table      | partitions | type  | possible_keys | key         | key_len | ref                | rows | filtered | Extra       |
+----+-------------+------------+------------+-------+---------------+-------------+---------+--------------------+------+----------+-------------+
|  1 | PRIMARY     | country    | NULL       | ALL   | PRIMARY       | NULL        | NULL    | NULL               |  239 |    33.33 | Using where |
|  1 | PRIMARY     | <derived2> | NULL       | ref   | <auto_key0>   | <auto_key0> | 12      | world.country.Code |   16 |   100.00 | NULL        |
|  3 | SUBQUERY    | <derived2> | NULL       | ALL   | NULL          | NULL        | NULL    | NULL               | 4046 |   100.00 | NULL        |
|  2 | DERIVED     | city       | NULL       | index | CountryCode   | CountryCode | 12      | NULL               | 4046 |   100.00 | NULL        |
+----+-------------+------------+------------+-------+---------------+-------------+---------+--------------------+------+----------+-------------+

두 가지 실행 계획을 살펴보면 쿼리에 DERIVED 단계가 있음을 알 수 있습니다 . VIEW의 구체화는 뷰가 참조될 때마다 더 많은 시간이 필요합니다.

샘플 데이터베이스는 작지만 두 쿼리의 실행 시간을 비교하기 위해 프로파일링을 활성화할 수 있습니다.

mysql> SET profiling=1;

# 쿼리를 여러번 수행

mysql> show profiles;
+----------+------------+-----------------------------------------------------------------------------------------------+
| Query_ID | Duration   | Query                                                                                         |
+----------+------------+-----------------------------------------------------------------------------------------------+
...
...                                                                                                                                                                                                                                                                                             |
|       35 | 0.00971925 | SELECT name, city_pop_by_country.sum_population/country.population ratio FROM country, ...    |
|       36 | 0.00963100 | SELECT name, city_pop_by_country.sum_population/country.population ratio FROM country, ...    |
|       37 | 0.00976900 | SELECT name, city_pop_by_country.sum_population/country.population ratio FROM country, ...    |
|       38 | 0.00963875 | SELECT name, city_pop_by_country.sum_population/country.population ratio FROM country, ...    |
|       39 | 0.00971200 | SELECT name, city_pop_by_country.sum_population/country.population ratio FROM country, ...    |
|       40 | 0.00546550 | WITH city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city ... |
|       41 | 0.00546975 | WITH city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city ... |
|       42 | 0.00550325 | WITH city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city ... |
|       43 | 0.00548000 | WITH city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city ... |
|       44 | 0.00545675 | WITH city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city ... |
+----------+------------+-----------------------------------------------------------------------------------------------+

뷰가 있는 쿼리의 실행 시간은 약 0.0097초인 반면 CTE의 경우 약 0.0054초입니다. 따라서 CTE가 더 빠릅니다. 더 큰 테이블을 사용하고 보기에 대한 더 많은 참조가 있다는 것은 쿼리 간의 차이가 더 관련성이 있음을 의미합니다.

VIEW 대신 CTE를 사용하는 것이 단일 materialization만 필요하고(테이블이 cache처럼 임시로 저장된다는 의미) 생성된 임시 테이블을 주 쿼리에서 여러 번 참조할 수 있기 때문에 더 효율적입니다.
단, 이로 인해 CTE를 무분별하게 사용할 경우, Query performance가 오히려 더 떨어질 수 있습니다.
Query Optimizer가 execution plan을 계산할 때 CTE의 최적화를 고려하지 않고 결과를 그대로 이용합니다.
테이블에 설정한 index가 CTE에서는 적용되지 않게 됩니다.
CTE를 생성할 때 필요한 데이터를 미리 필터링하는 등 최적화를 고려한 쿼리가 필요할 수 있다.

결론

MySQL 8.0에서 사용할 수 있는 새로운 공통 테이블 표현식 기능을 도입했습니다. CTE를 사용하면 대부분의 경우 쿼리의 가독성을 단순화할 수 있고 VIEW 대신 CTE를 사용하여 전체 성능을 향상시킬 수도 있습니다.

'MySQL' 카테고리의 다른 글

MySQL 테이블 단편화 (0)	2021.09.02
Recursive CTE(Common Table Expression) 활용 (0)	2021.08.31
MySQL Shell Upgrade Checker Utility (0)	2021.08.24
MySQL 8 및 MySQL 5.7 메모리 소비 (0)	2021.08.24
pt-kill (0)	2021.08.19

MySQL Shell Upgrade Checker Utility

타마마임팩트_쫀 2021. 8. 24. 14:45

2021. 8. 24. 14:45

MySQL Shell은 훌륭한 소프트웨어입니다. 또 mysql 클라이언트일 뿐만 아니라 JavaScript 및 Python용 스크립팅 기능을 제공하는 도구이기도 합니다. 그리고 그것으로 할 수 있는 다른 일 중 하나는 MySQL 5.7 서버가 업그레이드할 준비가 되었는지 확인하는 것입니다.

MySQL Shell Upgrade Checker Utility는 MySQL 5.7 인스턴스에서 호환성 오류 및 업그레이드 문제를 확인하는 스크립트입니다. "확인"이라는 단어에 주목하는 것이 중요합니다. 자동으로 해결해 주지 않습니다.

mysqlchk 프로그램과 –check-upgrade 매개변수는 유사한 작업을 수행합니다. CHECK TABLE … FOR UPGRADE 명령을 호출합니다. 자세한 내용은 https://dev.mysql.com/doc/refman/8.0/en/upgrade-prerequisites.html 에서 확인할 수 있습니다.

먼저 MySQL Shell 패키지를 설치해야 합니다.

yum install -y mysql-shell.x86_64
….
….
Transaction test succeeded
Running transaction
  Installing : mysql-shell-8.0.17-1.el7.x86_64                                                                      1/1
  Verifying  : mysql-shell-8.0.17-1.el7.x86_64                                                                      1/1
 
Installed:
  mysql-shell.x86_64 0:8.0.17-1.el7

이제 검사를 수행할 준비가 되었습니다. 아래 명령을 실행 합니다.

mysqlsh root@localhost -e "util.checkForServerUpgrade();"

[root@mysql2 ~]# mysqlsh root@localhost -e "util.checkForServerUpgrade();"
The MySQL server at /var%2Flib%2Fmysql%2Fmysql.sock, version 5.7.27-log - MySQL
Community Server (GPL), will now be checked for compatibility issues for
upgrade to MySQL 8.0.17...

1) Usage of old temporal type
  No issues found

2) Usage of db objects with names conflicting with reserved keywords in 8.0
  No issues found

3) Usage of utf8mb3 charset
  No issues found

4) Table names in the mysql schema conflicting with new tables in 8.0
  No issues found

5) Partitioned tables using engines with non native partitioning
  No issues found

6) Foreign key constraint names longer than 64 characters
  No issues found

7) Usage of obsolete MAXDB sql_mode flag
  No issues found

8) Usage of obsolete sql_mode flags
  No issues found

9) ENUM/SET column definitions containing elements longer than 255 characters
  No issues found

10) Usage of partitioned tables in shared tablespaces
  No issues found

11) Circular directory references in tablespace data file paths
  No issues found

12) Usage of removed functions
  No issues found

13) Usage of removed GROUP BY ASC/DESC syntax
  No issues found

14) Removed system variables for error logging to the system log configuration
  To run this check requires full path to MySQL server configuration file to be specified at 'configPath' key of options dictionary
  More information:
    https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-13.html#mysqld-8-0-13-logging

15) Removed system variables
  To run this check requires full path to MySQL server configuration file to be specified at 'configPath' key of options dictionary
  More information:
    https://dev.mysql.com/doc/refman/8.0/en/added-deprecated-removed.html#optvars-removed

16) System variables with new default values
  To run this check requires full path to MySQL server configuration file to be specified at 'configPath' key of options dictionary
  More information:
    https://mysqlserverteam.com/new-defaults-in-mysql-8-0/

17) Schema inconsistencies resulting from file removal or corruption
  No issues found

18) Issues reported by 'check table x for upgrade' command
  No issues found

19) New default authentication plugin considerations
  Warning: The new default authentication plugin 'caching_sha2_password' offers
    more secure password hashing than previously used 'mysql_native_password'
    (and consequent improved client connection authentication). However, it also
    has compatibility implications that may affect existing MySQL installations.
    If your MySQL installation must serve pre-8.0 clients and you encounter
    compatibility issues after upgrading, the simplest way to address those
    issues is to reconfigure the server to revert to the previous default
    authentication plugin (mysql_native_password). For example, use these lines
    in the server option file:

    [mysqld]
    default_authentication_plugin=mysql_native_password

    However, the setting should be viewed as temporary, not as a long term or
    permanent solution, because it causes new accounts created with the setting
    in effect to forego the improved authentication security.
    If you are using replication please take time to understand how the
    authentication plugin changes may impact you.
  More information:
    https://dev.mysql.com/doc/refman/8.0/en/upgrading-from-previous-series.html#upgrade-caching-sha2-password-compatibility-issues
    https://dev.mysql.com/doc/refman/8.0/en/upgrading-from-previous-series.html#upgrade-caching-sha2-password-replication

Errors:   0
Warnings: 1
Notices:  0

업그레이드를 방해하는 치명적인 오류는 발견되지 않았지만 몇 가지 잠재적인 문제가 감지되었습니다. 업그레이드하기 전에 보고된 문제가 중요하지 않은지 확인하십시오.

매개변수가 부족하여 일부 항목이 누락되었습니다.

util.checkForServerUpgrade('root@localhost:3306', {"password":"password", "targetVersion":"8.0.11", "configPath":"/etc/my.cnf"})

이제 하나가 아닌 22개의 경고가 있습니다!

13) System variables with new default values
  Warning: Following system variables that are not defined in your
    configuration file will have new default values. Please review if you rely on
    their current values and if so define them before performing upgrade.
  More information:
    https://mysqlserverteam.com/new-defaults-in-mysql-8-0/

  back_log - default value will change
  character_set_server - default value will change from latin1 to utf8mb4
  collation_server - default value will change from latin1_swedish_ci to
    utf8mb4_0900_ai_ci
  event_scheduler - default value will change from OFF to ON
  explicit_defaults_for_timestamp - default value will change from OFF to ON
  innodb_autoinc_lock_mode - default value will change from 1 (consecutive) to
    2 (interleaved)
  innodb_flush_method - default value will change from NULL to fsync (Unix),
    unbuffered (Windows)
  innodb_flush_neighbors - default value will change from 1 (enable) to 0
    (disable)
  innodb_max_dirty_pages_pct - default value will change from 75 (%)  90 (%)
  innodb_max_dirty_pages_pct_lwm - default value will change from_0 (%) to 10
    (%)
  innodb_undo_log_truncate - default value will change from OFF to ON
  innodb_undo_tablespaces - default value will change from 0 to 2
  log_error_verbosity - default value will change from 3 (Notes) to 2 (Warning)
  max_allowed_packet - default value will change from 4194304 (4MB) to 67108864
    (64MB)
  max_error_count - default value will change from 64 to 1024
  optimizer_trace_max_mem_size - default value will change from 16KB to 1MB
  performance_schema_consumer_events_transactions_current - default value will
    change from OFF to ON
  performance_schema_consumer_events_transactions_history - default value will
    change from OFF to ON
  slave_rows_search_algorithms - default value will change from 'INDEX_SCAN,
    TABLE_SCAN' to 'INDEX_SCAN, HASH_SCAN'
  table_open_cache - default value will change from 2000 to 4000
  transaction_write_set_extraction - default value will change from OFF to
    XXHASH64

MySQL 다운로드에서 Shell을 얻었습니다.

wget https://dev.mysql.com/get/Downloads/MySQL-Shell/mysql-shell-8.0.17-linux-glibc2.12-x86-64bit.tar.gz 
tar -xvzf mysql-shell-8.0.17-linux-glibc2.12-x86-64bit.tar.gz
cd mysql-shell-8.0.17-linux-glibc2.12-x86-64bit/bin/
./mysqlsh

MySQL 5.7에서 MySQL 8 로 이동하는 데 필요한 사전 검사를 MySQL Shell Upgrade Checker 유틸리티를 사용하면 그 어느 때보다 쉽게 수행할 수 있습니다.

'MySQL' 카테고리의 다른 글

Recursive CTE(Common Table Expression) 활용 (0)	2021.08.31
MySQL 8.0 신기능 CTE(Common Table Expression) 활용 (0)	2021.08.26
MySQL 8 및 MySQL 5.7 메모리 소비 (0)	2021.08.24
pt-kill (0)	2021.08.19
ONLY_FULL_GROUP_BY SQL 쿼리 실패 (0)	2021.08.12

PREV 이전 1 2 3 4 ···6 NEXT 다음

JJON's StorY

MySQL

MySQL 테이블 단편화

테스트 환경

사례 1: INSERT ROLLBACK

사례 2: INSERT 실패

사례 3: 페이지 분할을 통한 단편화

'MySQL' 카테고리의 다른 글

Recursive CTE(Common Table Expression) 활용

재귀 CTE 쿼리란 무엇입니까?

시리즈 생성

1단계 시퀀스

2단계 시퀀스

날짜 순서

계층적 데이터 순회

간단한 트리: 조직도

복잡한 트리: 족보

그래프: 기차 노선

제한 사항

결론

'MySQL' 카테고리의 다른 글

MySQL 8.0 신기능 CTE(Common Table Expression) 활용

CTE란?

CTE 생성 및 사용 방법

CTE의 범위

여러 CTE 연결

VIEW 대신 CTE 사용

결론

'MySQL' 카테고리의 다른 글

MySQL Shell Upgrade Checker Utility

'MySQL' 카테고리의 다른 글

+ Recent posts

티스토리툴바