Mysql 索引原理

内容隐藏

5 5. 使用 EXPLAIN 进行查询分析

1. 基础概念

什么是索引及其用途：
- 索引是数据库表的一种数据结构，用于提高查询速度。它类似于书籍的目录，可以帮助数据库快速定位数据。
不同类型的索引：
- B-tree索引：树形结构，使用广泛，适合范围查询。
- 哈希索引：适合精确查找（==），但不支持范围查询。
- 全文索引：用于对文本类型字段进行快速搜索，支持自然语言的搜索。
索引的基本原理：
- 索引通过减少数据库在查找数据时需要扫描的行数，从而降低查询的时间复杂度。

2. 创建与管理索引

创建索引：sqlCREATE INDEX index_name ON table_name(column_name);
删除索引：sqlDROP INDEX index_name ON table_name;
唯一索引和复合索引：
- 唯一索引：保证数据的唯一性。
- 复合索引：由多个列组成的索引，适用于多条件查询。

3. 索引的种类

主键索引：每个表有且只有一个，自动唯一性。
唯一索引：保证某列数据的唯一性，但允许 NULL 值。
普通索引：允许重复值。
全文索引：用于快速查找文本相关性。
空间索引：用于地理信息数据的检索。

4. 索引的影响

索引可以加速查询，但会延缓 DML 操作（插入、更新、删除）的速度，因为每次变更数据时需要维护相关索引。
适当的索引可以带来性能优化，过多或不适合的索引则可能导致性能下降。

5. 使用 EXPLAIN 进行查询分析

使用 EXPLAIN：sqlEXPLAIN SELECT * FROM table_name WHERE column_name = value;
输出字段的含义：
- type：访问类型（例如：ALL, index, range）。
- possible_keys：可能使用的索引。
- key：实际使用的索引。
- rows：数据库扫描的行数。
- extra：额外的信息。

6. 索引优化

定期审视和优化索引，分析执行计划，寻找可能的瓶颈。
添加或删除索引时要考虑查询频率及数据特性。

验证索引的执行效率

步骤

创建表并插入数据。
创建索引。
使用 EXPLAIN 检查查询计划。
执行查询并测量性能。

import pymysql
import random
import time

# 数据库连接参数
db_config = {
    'host': 'localhost',
    'user': 'your_username',  # 替换为你的 MySQL 用户名
    'password': 'your_password',  # 替换为你的 MySQL 密码
    'database': 'your_database',  # 替换为你的数据库名
}

# 创建数据库连接
connection = pymysql.connect(**db_config)

try:
    with connection.cursor() as cursor:
        # 创建表
        cursor.execute("""
        CREATE TABLE IF NOT EXISTS employees (
            id INT AUTO_INCREMENT PRIMARY KEY,
            name VARCHAR(100),
            department VARCHAR(100),
            salary DECIMAL(10, 2),
            hire_date DATE
        )
        """)
        connection.commit()
        
        # 插入大量数据
        departments = ['Sales'] * 800000 + ['IT'] * 100000 + ['HR'] * 50000 + ['Marketing'] * 50000 + ['Finance'] * 50000
        random.shuffle(departments)
        start_time = time.time()
        
        for i in range(1000000):  # 插入 1,000,000 条数据
            name = f'Employee_{i}'
            department = departments[i]  # 使 Sales 部门有更多的记录
            salary = round(random.uniform(40000, 120000), 2)
            hire_date = f'2020-{random.randint(1, 12):02d}-{random.randint(1, 28):02d}'
            cursor.execute("INSERT INTO employees (name, department, salary, hire_date) VALUES (%s, %s, %s, %s)",
                           (name, department, salary, hire_date))

        connection.commit()
        end_time = time.time()
        
        print(f"插入 1,000,000 条数据所需时间：{end_time - start_time:.2f} 秒")

        # 创建索引
        cursor.execute("CREATE INDEX idx_department ON employees(department);")
        connection.commit()

        # 使用 EXPLAIN 检查查询计划（带索引）
        cursor.execute("EXPLAIN SELECT * FROM employees WHERE department = 'Sales';")
        explain_result = cursor.fetchall()
        print("EXPLAIN 查询结果（带索引）：")
        for row in explain_result:
            print(row)  # 输出查询计划

        # 执行查询（带索引）
        start_time = time.time()
        cursor.execute("SELECT * FROM employees WHERE department = 'Sales';")
        results = cursor.fetchall()
        end_time = time.time()
        
        print(f"查询部门 Sales 的员工（带索引）所需时间：{end_time - start_time:.2f} 秒")
        print(f"查询结果数量（带索引）：{len(results)}")  # 输出查询结果的数量

        # 删除索引
        cursor.execute("DROP INDEX idx_department ON employees;")
        connection.commit()

        # 再次使用 EXPLAIN 检查查询计划（无索引）
        cursor.execute("EXPLAIN SELECT * FROM employees WHERE department = 'Sales';")
        explain_result = cursor.fetchall()
        print("EXPLAIN 查询结果（无索引）：")
        for row in explain_result:
            print(row)  # 输出查询计划

        # 执行查询（无索引）
        start_time = time.time()
        cursor.execute("SELECT * FROM employees WHERE department = 'Sales';")
        results = cursor.fetchall()
        end_time = time.time()
        
        print(f"查询部门 Sales 的员工（无索引）所需时间：{end_time - start_time:.2f} 秒")
        print(f"查询结果数量（无索引）：{len(results)}")  # 输出查询结果的数量

finally:
    connection.close()

代码说明

数据插入：我们插入 1,000,000 条员工记录，其中 Sales 部门占据 800,000 条，这会造成这个部门的数据量远远高于其他部门。通过这种方法，我们可以在查询 Sales 部门员工时看到更明显的索引效果。
查询执行：
- 首先创建索引，然后使用 EXPLAIN 命令检查查询计划，并执行查询以获取带索引的结果。
- 删除索引后，再次执行 EXPLAIN 和查询以获取无索引的结果。