关系型数据库高级特性、查询优化与性能调优
PostgreSQL是世界上最先进的开源关系型数据库,以其卓越的数据完整性、丰富的数据类型和强大的扩展能力著称。从Instagram到Notion,众多知名产品都在使用PostgreSQL。本指南将帮助你掌握PostgreSQL的核心特性、高级查询能力和性能优化技巧。
| 类型 | 用途 | 示例 |
|---|---|---|
| INTEGER / BIGINT | 整数 | 1, 999999 |
| NUMERIC(p,s) | 精确小数 | 99.99 |
| VARCHAR(n) | 变长字符串 | 'hello' |
| TEXT | 无限长文本 | '任意长度' |
| BOOLEAN | 布尔值 | TRUE/FALSE |
| DATE / TIMESTAMP | 日期时间 | '2026-04-06' |
| UUID | 全局唯一标识 | 'a0ee-...' |
| JSONB | 二进制JSON | '{"key": "value"}' |
| ARRAY | 数组 | '{1,2,3}' |
| INET | IP地址 | '192.168.1.1' |
窗口函数在不合并行的情况下执行跨行计算,是数据分析的利器。
-- 计算每个部门内的薪资排名
SELECT name, department, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank,
SUM(salary) OVER (PARTITION BY department) AS dept_total
FROM employees;
-- 计算环比增长率
SELECT date, revenue,
LAG(revenue) OVER (ORDER BY date) AS prev_revenue,
ROUND((revenue - LAG(revenue) OVER (ORDER BY date)) * 100.0 /
LAG(revenue) OVER (ORDER BY date), 2) AS growth_rate
FROM monthly_sales;
-- 移动平均
SELECT date, amount,
AVG(amount) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d
FROM daily_sales;
-- 普通CTE
WITH active_users AS (
SELECT id, name, email FROM users WHERE status = 'active'
),
user_orders AS (
SELECT u.name, COUNT(o.id) AS order_count,
SUM(o.amount) AS total_spent
FROM active_users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.name
)
SELECT * FROM user_orders WHERE total_spent > 1000;
-- 递归CTE(组织架构树)
WITH RECURSIVE org_tree AS (
SELECT id, name, manager_id, 1 AS level
FROM employees WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, e.manager_id, t.level + 1
FROM employees e
JOIN org_tree t ON e.manager_id = t.id
)
SELECT * FROM org_tree ORDER BY level, name;
-- 创建JSONB列的表
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
attributes JSONB
);
-- 插入JSONB数据
INSERT INTO products (name, attributes) VALUES
('手机', '{"brand": "Apple", "specs": {"ram": "8GB", "storage": "256GB"}}'),
('笔记本', '{"brand": "Lenovo", "specs": {"ram": "16GB", "storage": "512GB"}}');
-- 查询JSONB字段
SELECT name, attributes->>'brand' AS brand
FROM products
WHERE attributes->'specs'->>'ram' = '16GB';
-- JSONB索引(GIN索引加速查询)
CREATE INDEX idx_products_attrs ON products USING GIN (attributes);
| 类型 | 用途 |
|---|---|
| B-tree | 默认类型,支持等值和范围查询 |
| Hash | 仅等值查询,比B-tree更快 |
| GIN | JSONB、数组、全文搜索 |
| GiST | 地理数据、范围类型 |
| BRIN | 大表的块范围索引,占用空间小 |
| 部分索引 | 只索引满足条件的行 |
| 表达式索引 | 对计算结果建索引 |
-- 复合索引
CREATE INDEX idx_orders_user_date ON orders (user_id, created_at DESC);
-- 部分索引(只索引活跃用户)
CREATE INDEX idx_active_users ON users (email) WHERE status = 'active';
-- 表达式索引
CREATE INDEX idx_users_lower_email ON users (LOWER(email));
-- 查看执行计划
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 123;
# postgresql.conf 核心参数
shared_buffers = '4GB' # 共享内存,通常为总内存的25%
work_mem = '256MB' # 排序/哈希操作的内存
maintenance_work_mem = '1GB' # 维护操作(VACUUM、索引创建)
effective_cache_size = '12GB' # 操作系统缓存估计值
random_page_cost = 1.1 # SSD设为1.1,HDD设为4
max_connections = 200 # 最大连接数
wal_level = 'replica' # 支持逻辑复制
PostgreSQL每个连接占用较多内存,推荐使用PgBouncer作为连接池:
# PgBouncer配置
[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20
| 扩展 | 功能 |
|---|---|
| PostGIS | 地理空间数据处理 |
| pg_vector | 向量相似度搜索(AI/ML应用) |
| pg_stat_statements | 查询性能统计 |
| pg_cron | 定时任务调度 |
| pg_trgm | 模糊搜索和相似度匹配 |
| TimescaleDB | 时序数据自动分区 |
访问 ToolSnap 获取SQL格式化、JSON处理等在线开发者工具。
PostgreSQL在复杂查询、数据完整性、扩展性方面更强,支持窗口函数、CTE、JSONB等高级特性;MySQL更简单易用,读性能通常更好,社区生态更丰富。
窗口函数是在结果集的"窗口"上执行计算的函数,不需要像GROUP BY那样合并行。常用窗口函数包括ROW_NUMBER()、RANK()、LAG()、LEAD()、SUM() OVER()等。
主要方法包括:创建合适的索引、使用EXPLAIN ANALYZE分析执行计划、调整postgresql.conf参数、使用VACUUM维护表健康、合理使用连接池。