SQL NULLs: When Nothing Really Matters

SQL NULLs: When Nothing Really Matters

SQL NULLs: When Nothing Really Matters

Your no-BS weekly brief on software engineering.
Join 100,000+ developers

Imagine debugging a side project leaderboard where scores aren’t sorting as expected. NULL values in a key column are creating chaos, and now you’re in the weeds trying to figure out why your SQL query doesn’t behave consistently across databases. Whether you’re building tools for fun or optimizing production systems, understanding how SQL NULLs impact indexing and ordering can save you a lot of headaches. In this article, we’ll explore how NULL values create unexpected challenges, illustrate this with a real-world example, and offer strategies to take control of your queries.


Understanding NULLs in Indexing

Indexes are designed to speed up data retrieval, but their behavior with NULL values isn’t uniform across databases. Some key points to consider:

  • NULLs in Indexes:

    • In PostgreSQL, B-tree indexes include NULLs by default.
    • In MySQL’s InnoDB engine, NULLs are also indexed, but their behavior in compound indexes (multiple columns) can introduce quirks.
    • SQLite allows NULLs in unique indexes but treats them as non-equal to other NULLs, meaning you can have multiple rows with NULL in a column with a unique constraint.
  • Impact on Queries

Queries that involve conditions like IS NULL or IS NOT NULL can behave differently depending on whether the column is indexed. For instance, in a table tracking game scores, a query like SELECT * FROM scores WHERE score IS NULL might be necessary to identify incomplete games or uninitialized rows. However, such queries often do not leverage an index effectively, especially if the column has a high density of NULLs.


NULLs and Ordering

When ordering rows, the position of NULLs is database-dependent:

  • PostgreSQL: NULLs are treated as larger than any non-NULL value by default, so they appear last in ascending order and first in descending order. You can explicitly control this using NULLS FIRST or NULLS LAST.
  • MySQL: NULLs are treated as smaller than non-NULL values, appearing first in ascending order.
  • SQLite: Similar to MySQL, NULLs are treated as the smallest values.

This inconsistency can lead to surprises if your application depends on the order of NULLs.


A Real-World Scenario: Leaderboard Quirks

Let’s ground this discussion in a real-world scenario. Imagine you’re building a retro gaming leaderboard. Players’ scores are stored in a table, and some rows have NULL in the score column because their games haven’t been completed yet. Your schema might look like this:

CREATE TABLE leaderboard (
    id SERIAL PRIMARY KEY,
    player_name TEXT NOT NULL,
    score INTEGER,
    game_completed_at TIMESTAMP
);

CREATE INDEX idx_score ON leaderboard (score);

The business logic involves three scenarios:

  1. High Scores: Players with non-NULL score, sorted in descending order.

  2. Incomplete Games: Players where score is NULL.

  3. Recent Completers: Players sorted by game_completed_at with NULLs appearing last.

You want a query to prioritize incomplete games first, followed by completed games sorted by their scores.


Strategies to Address the Problem

To ensure consistent behavior and optimal performance, you can use the following strategies:

  • Explicitly Define NULL Placement: Modify the query to explicitly prioritize NULLs using SQL's NULLS FIRST or NULLS LAST options, which are supported by most databases, though SQLite does not natively support these options:
SELECT * FROM leaderboard
ORDER BY score DESC NULLS LAST;

This approach allows you to clearly dictate whether NULL values should appear at the beginning or end of your result set, ensuring consistent behavior across database systems.

  • Custom Logic with Boolean Expressions: Use expressions like IS NULL to manually control the order of NULLs relative to non-NULL values:
SELECT * FROM leaderboard
ORDER BY score IS NULL DESC, score DESC;

This ensures that incomplete games appear first, followed by high scores in descending order, and is particularly useful when NULLS FIRST or NULLS LAST is unavailable.

  • Partial Indexing: Improve query performance by creating an index that excludes NULLs:
CREATE INDEX idx_score_nonnull ON leaderboard (score)
WHERE score IS NOT NULL;

This type of index speeds up queries that target only non-NULL values and reduces the overhead of indexing sparse columns.

  • Default Values: If appropriate, set a default score for new rows to avoid NULLs altogether:
ALTER TABLE leaderboard
ALTER COLUMN score SET DEFAULT 0;
  • Application-Level Handling: Adjust application logic to explicitly fetch rows with NULL values first and merge results from multiple queries:
incomplete_games = Leaderboard.where(score: nil).order(:id)
completed_games = Leaderboard.where.not(score: nil).order(score: :desc)
prioritized_leaderboard = incomplete_games + completed_games

Knowing is Half the Battle

NULLs in indexing and ordering can create unexpected behavior, especially in scenarios where business logic relies on the absence of data. By understanding the nuances of your database’s behavior and applying strategies like explicit NULL placement, partial indexing, or application-level adjustments, you can ensure consistent and predictable results.