A recent discussion about SQL query optimization has sparked debate among database developers regarding the performance costs of OR clauses and potential workarounds. The conversation centers around a practical example showing how OR queries can be significantly slower than their AND-based alternatives, leading to broader discussions about schema design patterns and query optimization strategies.
The Core Performance Problem
The original example demonstrates a striking performance difference in PostgreSQL. A query using OR to find applications where a user is either a submitter or reviewer takes over 100 milliseconds with one million records. However, rewriting the same logic using separate AND-based queries reduces execution time to under 1 millisecond - a performance improvement of over 100 times.
This dramatic difference occurs even when proper indexes exist on the filtered columns. The issue stems from how database query planners handle OR operations, which often require either merging separate index lookups or performing full table scans, both of which are computationally expensive compared to direct index access.
Performance Comparison:
- OR Query: >100ms execution time
- AND Query Alternative: <1ms execution time
- Performance Improvement: >100x faster
- Test Environment: 1,000,000 applications, 1,000 users, PostgreSQL
Community Perspectives on Query Optimization
Database professionals in the discussion highlight several important considerations. Some argue that while performance optimizations are valuable, they shouldn't come at the expense of code clarity and maintainability. The original OR query better expresses the developer's intent and communicates more clearly with future programmers who need to understand the code.
Others point out that modern query optimizers are becoming more sophisticated. There's ongoing development in PostgreSQL and other database systems to automatically optimize these types of queries, potentially making manual rewrites unnecessary in future versions.
The Extension Table Pattern
A popular solution discussed involves restructuring database schemas using what developers call the extension pattern. Instead of having multiple foreign key columns in the same table, this approach creates separate junction tables that establish relationships more efficiently.
For the application example, this means creating an application_user table that links users to applications with a type indicator (submitter or reviewer). This design allows queries to follow a linear path through indexes rather than requiring complex merge operations.
I really like the extension pattern. I wish more of the tables at my company used it.
Extension Pattern Schema Example:
-- Original problematic structure
create table application (
application_id int8 not null,
submitter_id int8 not null,
reviewer_id int8 not null
);
-- Extension pattern solution
create table application_user (
user_id int8 not null,
application_id int8 not null,
user_type enum ('submitter', 'reviewer') not null
);
Broader Implications for Database Design
The discussion reveals that schema design decisions have far-reaching impacts beyond simple query performance. Developers note that the extension pattern also simplifies integration with search systems like Elasticsearch and reduces the need for complex denormalization strategies.
However, experienced database professionals caution against over-generalizing these optimization techniques. The effectiveness of different approaches depends heavily on specific database systems, data distributions, and query patterns. What works well for PostgreSQL might not apply to other database engines, and solutions that help with simple cases can become unwieldy with complex multi-table joins.
The conversation also touches on the fundamental challenge of query optimization: database systems must make execution decisions without complete knowledge of result set sizes, making it difficult to choose optimal strategies automatically.
Practical Recommendations
For developers facing similar performance issues, the community suggests several approaches. First, understanding execution plans is crucial for diagnosing performance problems. Different database systems provide tools to visualize how queries are executed, helping identify bottlenecks.
Second, the choice between OR optimization techniques and schema restructuring should consider the specific use case. For applications that frequently need to query across multiple relationship types, the extension pattern offers clear benefits. For simpler cases or systems where schema changes are difficult, query rewrites might be more practical.
The discussion emphasizes that effective database design requires understanding access patterns, read versus write workloads, and potential contention issues. These factors often matter more than following general optimization rules.
Reference: A SQL Heuristic: ORs Are Expensive
