Querying Hierarchical Data with Postgres

Wietsche Calitz

Query hierarchical data in Postgres using recursive CTEs. Navigate up/down trees, track depth, and aggregate—great for parent-child data.

Hierarchical data is prevalent and simple to store, but querying it can be challenging. This post will guide you through the process of navigating hierarchical data in Postgres. It’s worth noting that these instructions are also applicable to other database systems that support Common Table Expressions, as introduced in SQL:1999.

Sample dataset

The dataset we’ll use is a good introduction for those unfamiliar with hierarchical data, also known as parent-child relationships. The example should be self-explanatory.

WITH dataset(id, parent_id, name) AS (  VALUES   (1::int ,NULL::int, 'Earth')  ,(2, 1, 'Europe')  ,(3, 1, 'Asia')  ,(4, 2, 'Germany')  ,(5, 2, 'France')  ,(6, 3, 'China')  ,(7, 3, 'India')  ,(8, 4, 'Berlin')  ,(9, 4, 'Munich')  ,(10, 5, 'Paris')  ,(11, 5, 'Lyon')  ,(12, 6, 'Beijing')  ,(13, 6, 'Shanghai')  ,(14, 7, 'Mumbai')  ,(15, 7, 'Delhi')  ,(16, 2, 'Belgium')  ,(17, 16, 'Brussels'))SELECT * FROM dataset;
WITH dataset(id, parent_id, name) AS (  VALUES   (1::int ,NULL::int, 'Earth')  ,(2, 1, 'Europe')  ,(3, 1, 'Asia')  ,(4, 2, 'Germany')  ,(5, 2, 'France')  ,(6, 3, 'China')  ,(7, 3, 'India')  ,(8, 4, 'Berlin')  ,(9, 4, 'Munich')  ,(10, 5, 'Paris')  ,(11, 5, 'Lyon')  ,(12, 6, 'Beijing')  ,(13, 6, 'Shanghai')  ,(14, 7, 'Mumbai')  ,(15, 7, 'Delhi')  ,(16, 2, 'Belgium')  ,(17, 16, 'Brussels'))SELECT * FROM dataset;
WITH dataset(id, parent_id, name) AS (  VALUES   (1::int ,NULL::int, 'Earth')  ,(2, 1, 'Europe')  ,(3, 1, 'Asia')  ,(4, 2, 'Germany')  ,(5, 2, 'France')  ,(6, 3, 'China')  ,(7, 3, 'India')  ,(8, 4, 'Berlin')  ,(9, 4, 'Munich')  ,(10, 5, 'Paris')  ,(11, 5, 'Lyon')  ,(12, 6, 'Beijing')  ,(13, 6, 'Shanghai')  ,(14, 7, 'Mumbai')  ,(15, 7, 'Delhi')  ,(16, 2, 'Belgium')  ,(17, 16, 'Brussels'))SELECT * FROM dataset;
 id | parent_id |   name   ----+-----------+----------  1 |           | Earth  2 |         1 | Europe  3 |         1 | Asia  4 |         2 | Germany  5 |         2 | France  6 |         3 | China  7 |         3 | India  8 |         4 | Berlin  9 |         4 | Munich 10 |         5 | Paris 11 |         5 | Lyon 12 |         6 | Beijing 13 |         6 | Shanghai 14 |         7 | Mumbai 15 |         7 | Delhi 16 |         2 | Belgium 17 |        16 | Brussels
 id | parent_id |   name   ----+-----------+----------  1 |           | Earth  2 |         1 | Europe  3 |         1 | Asia  4 |         2 | Germany  5 |         2 | France  6 |         3 | China  7 |         3 | India  8 |         4 | Berlin  9 |         4 | Munich 10 |         5 | Paris 11 |         5 | Lyon 12 |         6 | Beijing 13 |         6 | Shanghai 14 |         7 | Mumbai 15 |         7 | Delhi 16 |         2 | Belgium 17 |        16 | Brussels
 id | parent_id |   name   ----+-----------+----------  1 |           | Earth  2 |         1 | Europe  3 |         1 | Asia  4 |         2 | Germany  5 |         2 | France  6 |         3 | China  7 |         3 | India  8 |         4 | Berlin  9 |         4 | Munich 10 |         5 | Paris 11 |         5 | Lyon 12 |         6 | Beijing 13 |         6 | Shanghai 14 |         7 | Mumbai 15 |         7 | Delhi 16 |         2 | Belgium 17 |        16 | Brussels

Develop the Query

Let’s start with a question: “In which continent is Brussels?” Instead of providing the SQL immediately, join me in developing a strategy that can serve as a foundation for answering more complex questions.

Step 1: Identify the starting node


WITH dataset(id, parent_id, name) AS (  VALUES-- ommitting values for brevity), start_node AS (  SELECT id, parent_id, name FROM dataset  WHERE name = 'Brussels')SELECT * FROM start_node;
WITH dataset(id, parent_id, name) AS (  VALUES-- ommitting values for brevity), start_node AS (  SELECT id, parent_id, name FROM dataset  WHERE name = 'Brussels')SELECT * FROM start_node;
WITH dataset(id, parent_id, name) AS (  VALUES-- ommitting values for brevity), start_node AS (  SELECT id, parent_id, name FROM dataset  WHERE name = 'Brussels')SELECT * FROM start_node;
id | parent_id |   name   ----+-----------+---------- 17 |        16 | Brussels
id | parent_id |   name   ----+-----------+---------- 17 |        16 | Brussels
id | parent_id |   name   ----+-----------+---------- 17 |        16 | Brussels


Step 2: Include the immediate parent (or children if you’re navigating in the opposite direction)


WITH dataset(id, parent_id, name) AS (  VALUES-- ommitting values for brevity), start_node AS (  SELECT id, parent_id, name FROM dataset  WHERE name = 'Brussels')  ,ancestor AS (    SELECT * FROM start_node    UNION    SELECT dataset.* FROM dataset, start_node    WHERE start_node.parent_id = dataset.id    )
WITH dataset(id, parent_id, name) AS (  VALUES-- ommitting values for brevity), start_node AS (  SELECT id, parent_id, name FROM dataset  WHERE name = 'Brussels')  ,ancestor AS (    SELECT * FROM start_node    UNION    SELECT dataset.* FROM dataset, start_node    WHERE start_node.parent_id = dataset.id    )
WITH dataset(id, parent_id, name) AS (  VALUES-- ommitting values for brevity), start_node AS (  SELECT id, parent_id, name FROM dataset  WHERE name = 'Brussels')  ,ancestor AS (    SELECT * FROM start_node    UNION    SELECT dataset.* FROM dataset, start_node    WHERE start_node.parent_id = dataset.id    )
id | parent_id |   name   ----+-----------+---------- 16 |         2 | Belgium 17 |        16 | Brussels
id | parent_id |   name   ----+-----------+---------- 16 |         2 | Belgium 17 |        16 | Brussels
id | parent_id |   name   ----+-----------+---------- 16 |         2 | Belgium 17 |        16 | Brussels

Step 3: Make the query recursive

Postgres allows the creation of recursive queries, which are queries that execute on their own results. In this scenario, our goal is to tweak the ancestor common table expression so it references itself. This adjustment will prompt the query to repeat the procedure described in Step 2 until no more rows are returned.

WITH RECURSIVE dataset(id, parent_id, name) AS (  VALUES-- ommitting values for brevity), start_node AS (  SELECT id, parent_id, name FROM dataset  WHERE name = 'Brussels')  , ancestor AS (    SELECT * FROM start_node    UNION    SELECT dataset.* FROM dataset, ancestor    WHERE ancestor.parent_id = dataset.id    )SELECT * FROM ancestor;
WITH RECURSIVE dataset(id, parent_id, name) AS (  VALUES-- ommitting values for brevity), start_node AS (  SELECT id, parent_id, name FROM dataset  WHERE name = 'Brussels')  , ancestor AS (    SELECT * FROM start_node    UNION    SELECT dataset.* FROM dataset, ancestor    WHERE ancestor.parent_id = dataset.id    )SELECT * FROM ancestor;
WITH RECURSIVE dataset(id, parent_id, name) AS (  VALUES-- ommitting values for brevity), start_node AS (  SELECT id, parent_id, name FROM dataset  WHERE name = 'Brussels')  , ancestor AS (    SELECT * FROM start_node    UNION    SELECT dataset.* FROM dataset, ancestor    WHERE ancestor.parent_id = dataset.id    )SELECT * FROM ancestor;
id | parent_id |   name   ----+-----------+---------- 17 |        16 | Brussels 16 |         2 | Belgium  2 |         1 | Europe  1 |           | Earth
id | parent_id |   name   ----+-----------+---------- 17 |        16 | Brussels 16 |         2 | Belgium  2 |         1 | Europe  1 |           | Earth
id | parent_id |   name   ----+-----------+---------- 17 |        16 | Brussels 16 |         2 | Belgium  2 |         1 | Europe  1 |           | Earth

Make sure you understand this step before continuing. It can be a bit of a mind-bending!

Step 4: Add a depth counter

The depth counter begins at zero and increases with each added ancestor to the output.

...ancestor AS (    SELECT *, 0 AS depth FROM start_node    UNION    SELECT dataset.*, depth+1 FROM dataset, ancestor    WHERE ancestor.parent_id = dataset.id    )...
...ancestor AS (    SELECT *, 0 AS depth FROM start_node    UNION    SELECT dataset.*, depth+1 FROM dataset, ancestor    WHERE ancestor.parent_id = dataset.id    )...
...ancestor AS (    SELECT *, 0 AS depth FROM start_node    UNION    SELECT dataset.*, depth+1 FROM dataset, ancestor    WHERE ancestor.parent_id = dataset.id    )...
 id | parent_id |   name   | depth ----+-----------+----------+---------- 17 |        16 | Brussels |        0 16 |         2 | Belgium  |        1  2 |         1 | Europe   |        2  1 |           | Earth
 id | parent_id |   name   | depth ----+-----------+----------+---------- 17 |        16 | Brussels |        0 16 |         2 | Belgium  |        1  2 |         1 | Europe   |        2  1 |           | Earth
 id | parent_id |   name   | depth ----+-----------+----------+---------- 17 |        16 | Brussels |        0 16 |         2 | Belgium  |        1  2 |         1 | Europe   |        2  1 |           | Earth


With a little more code, you can set a filter to return the continent based on the second-highest depth:

WITH RECURSIVE dataset(id, parent_id, name) AS (  VALUES-- ommitting values for brevity), start_node AS (  SELECT id, parent_id, name FROM dataset  WHERE name = 'Brussels')  , ancestor AS (    SELECT *, 0 AS depth FROM start_node    UNION    SELECT dataset.*, depth+1 FROM dataset, ancestor    WHERE ancestor.parent_id = dataset.id    )SELECT name as continent FROM ancestorWHERE depth = (SELECT MAX(depth)-1 FROM ancestor);
WITH RECURSIVE dataset(id, parent_id, name) AS (  VALUES-- ommitting values for brevity), start_node AS (  SELECT id, parent_id, name FROM dataset  WHERE name = 'Brussels')  , ancestor AS (    SELECT *, 0 AS depth FROM start_node    UNION    SELECT dataset.*, depth+1 FROM dataset, ancestor    WHERE ancestor.parent_id = dataset.id    )SELECT name as continent FROM ancestorWHERE depth = (SELECT MAX(depth)-1 FROM ancestor);
WITH RECURSIVE dataset(id, parent_id, name) AS (  VALUES-- ommitting values for brevity), start_node AS (  SELECT id, parent_id, name FROM dataset  WHERE name = 'Brussels')  , ancestor AS (    SELECT *, 0 AS depth FROM start_node    UNION    SELECT dataset.*, depth+1 FROM dataset, ancestor    WHERE ancestor.parent_id = dataset.id    )SELECT name as continent FROM ancestorWHERE depth = (SELECT MAX(depth)-1 FROM ancestor);
 continent ----------- Europe
 continent ----------- Europe
 continent ----------- Europe


Top Down

Now, let’s consider a reverse navigation, for example, returning cities in Asia. If we skip to step 3 from the above instructions, it would appear as follows. Pay attention to how the direction of navigation changes by reversing the join conditions. The condition ancestor.parent_id = dataset.id navigates upwards, while ancestor.id = dataset.parent_id navigates downwards.


WITH RECURSIVE dataset(id, parent_id, name) AS (  VALUES-- ommitting values for brevity), start_node AS (  SELECT id, parent_id, name FROM dataset  WHERE name = 'Asia')  , ancestor AS (    SELECT *, 0 AS depth FROM start_node    UNION    SELECT dataset.*, depth+1 FROM dataset, ancestor    WHERE ancestor.id = dataset.parent_id    )SELECT name as cities_in_asia FROM ancestor WHERE depth = 2;
WITH RECURSIVE dataset(id, parent_id, name) AS (  VALUES-- ommitting values for brevity), start_node AS (  SELECT id, parent_id, name FROM dataset  WHERE name = 'Asia')  , ancestor AS (    SELECT *, 0 AS depth FROM start_node    UNION    SELECT dataset.*, depth+1 FROM dataset, ancestor    WHERE ancestor.id = dataset.parent_id    )SELECT name as cities_in_asia FROM ancestor WHERE depth = 2;
WITH RECURSIVE dataset(id, parent_id, name) AS (  VALUES-- ommitting values for brevity), start_node AS (  SELECT id, parent_id, name FROM dataset  WHERE name = 'Asia')  , ancestor AS (    SELECT *, 0 AS depth FROM start_node    UNION    SELECT dataset.*, depth+1 FROM dataset, ancestor    WHERE ancestor.id = dataset.parent_id    )SELECT name as cities_in_asia FROM ancestor WHERE depth = 2;
 cities_in_asia ---------------- Beijing
 cities_in_asia ---------------- Beijing
 cities_in_asia ---------------- Beijing


Aggregation

For example, calculating the sum of populations in European cities:

WITH  RECURSIVE dataset(id, parent_id, name, population) AS (    VALUES     (1::int ,NULL::int, 'Earth', NULL::int)    ,(2, 1, 'Europe', NULL)    ,(3, 1, 'Asia', NULL)    ,(4, 2, 'Germany', NULL)    ,(5, 2, 'France', NULL)    ,(6, 3, 'China', NULL)    ,(7, 3, 'India', NULL)    ,(8, 4, 'Berlin', 3600000)    ,(9, 4, 'Munich', 1500000)    ,(10, 5, 'Paris', 2200000)    ,(11, 5, 'Lyon', 500000)    ,(12, 6, 'Beijing', 21000000)    ,(13, 6, 'Shanghai', 24000000)    ,(14, 7, 'Mumbai', 20000000)    ,(15, 7, 'Delhi', 30000000)    ,(16, 2, 'Belgium', 11000000)    ,(17, 16, 'Brussels', 2000000)    ), start_node AS (    SELECT id, parent_id, name, population FROM dataset    WHERE name = 'Europe')    , ancestor AS (        SELECT * FROM start_node        UNION        SELECT dataset.* FROM dataset, ancestor        WHERE ancestor.id = dataset.parent_id      )SELECT SUM(population) AS europe_population FROM ancestor;
WITH  RECURSIVE dataset(id, parent_id, name, population) AS (    VALUES     (1::int ,NULL::int, 'Earth', NULL::int)    ,(2, 1, 'Europe', NULL)    ,(3, 1, 'Asia', NULL)    ,(4, 2, 'Germany', NULL)    ,(5, 2, 'France', NULL)    ,(6, 3, 'China', NULL)    ,(7, 3, 'India', NULL)    ,(8, 4, 'Berlin', 3600000)    ,(9, 4, 'Munich', 1500000)    ,(10, 5, 'Paris', 2200000)    ,(11, 5, 'Lyon', 500000)    ,(12, 6, 'Beijing', 21000000)    ,(13, 6, 'Shanghai', 24000000)    ,(14, 7, 'Mumbai', 20000000)    ,(15, 7, 'Delhi', 30000000)    ,(16, 2, 'Belgium', 11000000)    ,(17, 16, 'Brussels', 2000000)    ), start_node AS (    SELECT id, parent_id, name, population FROM dataset    WHERE name = 'Europe')    , ancestor AS (        SELECT * FROM start_node        UNION        SELECT dataset.* FROM dataset, ancestor        WHERE ancestor.id = dataset.parent_id      )SELECT SUM(population) AS europe_population FROM ancestor;
WITH  RECURSIVE dataset(id, parent_id, name, population) AS (    VALUES     (1::int ,NULL::int, 'Earth', NULL::int)    ,(2, 1, 'Europe', NULL)    ,(3, 1, 'Asia', NULL)    ,(4, 2, 'Germany', NULL)    ,(5, 2, 'France', NULL)    ,(6, 3, 'China', NULL)    ,(7, 3, 'India', NULL)    ,(8, 4, 'Berlin', 3600000)    ,(9, 4, 'Munich', 1500000)    ,(10, 5, 'Paris', 2200000)    ,(11, 5, 'Lyon', 500000)    ,(12, 6, 'Beijing', 21000000)    ,(13, 6, 'Shanghai', 24000000)    ,(14, 7, 'Mumbai', 20000000)    ,(15, 7, 'Delhi', 30000000)    ,(16, 2, 'Belgium', 11000000)    ,(17, 16, 'Brussels', 2000000)    ), start_node AS (    SELECT id, parent_id, name, population FROM dataset    WHERE name = 'Europe')    , ancestor AS (        SELECT * FROM start_node        UNION        SELECT dataset.* FROM dataset, ancestor        WHERE ancestor.id = dataset.parent_id      )SELECT SUM(population) AS europe_population FROM ancestor;
 europe_population -------------------          20800000
 europe_population -------------------          20800000
 europe_population -------------------          20800000

Notes on performance

Recursion depth

Limit the recursion depth to avoid potentially expensive queries. Understanding your data is crucial for this process. From the example above, we know that the maximum depth is 3. By limiting recursion, you can protect against hidden data quality issues:

ancestor AS (
SELECT *, 0 AS depth FROM start_node
UNION
SELECT dataset.*, depth+1 FROM dataset, ancestor
WHERE (ancestor.parent_id = dataset.id)
AND depth <= 3
)

Indexes

When working with persistent tables, the general approach is to create indexes for the join conditions, such as adding an index on ‘id’ and ‘parent_id’. However, for small datasets, this might not make a significant difference.

Relational vs Graph databases

Postgres, a Relational Database Management System (RDBMS), isn’t the best for navigating complex hierarchies compared to graph databases like Neo4j. However, the hierarchical data we often handle is a small part of a larger domain. This domain mostly consists of simpler data structures that are well-suited for relational databases. Therefore, unless the entire domain is conducive to graph structures, it’s likely that you’ll have to optimize the use of what the RDBMS provides.

In summary

Remember these steps the next time you encounter a hierarchical dataset in Postgres:

  1. Begin by identifying your starting point. This means determining the exact node from which you’ll begin your data retrieval process.

  2. Next, determine the immediate parent or child node, depending on the direction you intend to navigate. This is crucial for shaping the path of your navigation.

  3. Once you’ve determined your direction, modify your query to make it recursive. This will allow your query to repeat until no more rows are returned.

  4. Incorporate a depth counter into your query. This will keep track of how many layers deep into the database structure you have navigated, offering clear insight into the complexity of your data.

  5. Finally, if necessary, you can filter or aggregate the results to customize the end product of your query.

Latest

uv scripts: micro-production situationship

Portable S3 security for EU clouds

Portable S3 security for EU clouds using JWT, OPA, and temporary credentials without hyperscaler lock-in.

Data Platforms for humans

Data platforms fail when people are ignored. Why kindness and communication matter as much as architecture.

Subscribe to our monthly newsletter

Subscribe to our monthly newsletter

Subscribe to our monthly newsletter

Belgium

Vismarkt 17, 3000 Leuven - HQ
Boomgaardstraat 115, 2018 Antwerpen


Vat. BE.0667.976.246

© 2025 Dataminded. All rights reserved.


Belgium

Vismarkt 17, 3000 Leuven - HQ
Boomgaardstraat 115, 2018 Antwerpen

Vat. BE.0667.976.246

© 2025 Dataminded. All rights reserved.


Belgium

Vismarkt 17, 3000 Leuven - HQ
Boomgaardstraat 115, 2018 Antwerpen

Vat. BE.0667.976.246

© 2025 Dataminded. All rights reserved.