Organize Your Data with child-parent hierarchy and Subtotal in MSSQL

01 Introduction to child-parent hierarchy in MSSQL
02 Benefits of using child-parent hierarchy
03 Understanding the structure of child-parent hierarchy
04 Creating a table with child-parent hierarchy and inserting data into the table
Create the table
Define the foreign key constraint
Insert data into the table
Update HID column in the table
05 Querying data from the table using CTEs
06 Updating and deleting data in the child-parent hierarchy
07 Effective techniques for structuring your data using a parent-child hierarchy
08 Next steps for implementing a child-parent hierarchy and subtotals, and concluding remarks

01 Introduction to child-parent hierarchy in MSSQL

The child-parent hierarchy is a data model in Microsoft SQL Server (MSSQL) that represents hierarchical data in a table format. In this approach, each row in the table represents a node in the hierarchy, and each node is associated with a unique identification, a parent identifier, and data properties.

The parent identification field provides the node's parent in the hierarchy, with a NULL parent identifier for the root node. Organizational hierarchies, product categories, and other sorts of hierarchical relationships are frequently represented using child-parent hierarchy models.

In MSSQL, you can establish a child-parent hierarchy by using the Common Table Expression (CTE) feature, which allows you to define a recursive query that traverses the hierarchy iteratively. The recursive query begins with the root nodes (those with NULL parent identifiers) and chooses their children iteratively until all nodes have been explored.

You can use the CTE in queries to traverse the hierarchy, obtain data related to each node, and conduct other activities after you've constructed it. MSSQL includes a number of built-in functions for working with hierarchical data, including the HIERARCHYID data type, which specifies a node's position in the hierarchy, and the ISDESCENDANTOF function, which determines whether one node is a descendant of another.

Overall, the MSSQL child-parent hierarchy model is an effective method for representing and managing hierarchical data in a relational database system.

02 Benefits of using child-parent hierarchy

There are several benefits to using the child-parent hierarchy model in MSSQL:

Efficient data storage: The child-parent hierarchy architecture allows for the efficient storage of hierarchical data in a single table, making it easier to maintain and query the data.
Querying the Hierarchy: MSSQL's built-in functions and features, such as Common Table Expression (CTE), HIERARCHYID data type, and ISDESCENDANTOF function, make it simple to traverse and query the hierarchy, allowing you to obtain and manage hierarchical data more simply.
Flexibility: The child-parent hierarchy model allows for the easy addition of new nodes and modification of the hierarchy structure, making it more flexible than other models.
Improved performance: By using efficient querying techniques such as CTE, you can significantly improve the performance of queries involving hierarchical data.
Application code is made simpler when employing the child-parent hierarchy model when working with hierarchical data. This makes it simpler to maintain and update the application and to write the code and logic necessary to navigate and control the hierarchy.

Overall, employing the child-parent hierarchy model in MSSQL has a number of advantages, such as effective data management, simple querying, adaptability, enhanced performance, and simplified application code.

03 Understanding the structure of child-parent hierarchy

A row in a database with the following columns makes up each node in a child-parent hierarchy in MSSQL for each node in the hierarchy:

Node ID: A unique identifier for the node.
Node name or label: A descriptive label or name for the node.
Parent ID: The ID of the node's parent in the hierarchy.
Data columns: Optional columns that contain data associated with the node.

Each node has a parent node, with the exception of the root node(s), which have no parents, and these interactions between nodes show how the hierarchy is organized. A tree-like structure is formed of nodes, each of which can have one or more child nodes.

A single table containing all nodes is commonly used to represent the hierarchy in MSSQL, with the parent ID column connecting each node to its parent node. For root nodes, the parent ID column is normally set to NULL.

In MSSQL, you may use recursive queries with Common Table Expressions (CTEs) to iteratively explore the hierarchy from the root nodes and choose the offspring of those nodes. Both the base case, which selects the set of root nodes, and the recursive case, which chooses all child nodes of a certain node, are specified by the CTE query.

Overall, the child-parent hierarchy structure in MSSQL is simple. Each node is represented by a row in a database and connected to its parent node by a parent ID field, allowing for quick queries and effective processing of hierarchical data.

04 Creating a table with child-parent hierarchy and inserting data into the table

To create a table with child-parent hierarchy in MSSQL, you can follow these general steps:

Determine the columns you need:

You'll need columns to keep track of each node's unique identification, name or label, parent ID, and any other information related to the node

(In my table columns are NodeID, NodeName, ParentID, and Sales)

Create the table:

Use the CREATE TABLE statement to create the table with the necessary columns.

            
---- Create Table ----
CREATE TABLE [Hierarchy] (
    [NodeID] [int] NOT NULL,
    [NodeName] [varchar](100) NOT NULL,
    [ParentID] [int] NULL,
    [HID] [hierarchyid] NULL,
    [Sales] [decimal](18,2) NULL,
CONSTRAINT [PK_Hierarchy] PRIMARY KEY CLUSTERED 
(
    [NodeID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

This creates a table called "Hierarchy" with columns for NodeID, NodeName, ParentID, HID, and Sale. The NodeID column is set as the primary key for the table.

Define the foreign key constraint:

The ParentID column needs to have a foreign key constraint that references the NodeID column in order to enforce the hierarchy relationship. As a result, each node can only have a parent that is a member of the same table.

            
---- Define Foreign Key  ----
ALTER TABLE [Hierarchy]
ADD CONSTRAINT [FK_Hierarchy_ParentID_NodeID]
FOREIGN KEY (ParentID)
REFERENCES [Hierarchy] (NodeID)

This adds a foreign key constraint to the table that ensures that the ParentID column references the NodeID column in the same table.

Insert data into the table:

After creating the table, you can begin constructing the hierarchy by inserting data into it. To get started, insert the root nodes, which should have a ParentID value of NULL. Next, insert the child nodes, and include the ID of the parent node in the ParentID column.

            
---- Insert data into the table ----
INSERT INTO [Hierarchy] (NodeID, NodeName, ParentID, HID, Sales)
VALUES	(1, 'Whiskey', NULL, NULL, NULL),
        (2, 'Jack Daniel', 1, NULL, NULL),
        (3, 'Jack Daniel Old No. 7', 2, NULL, 3000.00),
        (4, 'Jack Daniel Tennessee Honey', 2, NULL, 2800.00),
        (5, 'Jack Daniel Single Barrel', 2, NULL, 4500.00),
        (6, 'Jim Beam', 1, NULL, NULL),
        (7, 'Jim Beam Original', 6, NULL, 8500.00),
        (8, 'Jim Beam Black', 6, NULL, 2000.00),
        (9, 'Jameson', 1, NULL, NULL),
        (10, 'Jameson Irish Whiskey', 9, NULL, 1800.00),
        (11, 'Jameson Black Barrel', 9, NULL, 3200.00),
        (12, 'Jameson Gold Reserve', 9, NULL, 5100.00),
        (13, 'Johnnie Walker', 1, NULL, NULL),
        (14, 'Johnnie Walker Blue Label', 13, NULL, NULL),
        (15, '750ml Johnnie Walker Blue Label', 14, NULL, 1500.00),
        (16, '1000ml Johnnie Walker Blue Label', 14, NULL, 3000.00),
        (17, 'Johnnie Walker Black Label', 13, NULL, 2500.00),
        (18, 'Johnnie Walker Double Black Label', 13, NULL, 4500.00),
        (19, 'Johnnie Walker Gold Label Reserve', 13, NULL, 3670.00),
        (20, 'Johnnie Walker Green Label', 13, NULL, 2000.00),
        (21, 'Vodka', NULL, NULL, NULL),
        (22, 'Absolut', 21, NULL, NULL),
        (23, 'Absolut Original', 22, NULL, 4250.00),
        (24, 'Absolut Citron', 22, NULL, 6150.00),
        (25, 'Absolut Peppar', 22, NULL, NULL),
        (26, '500ml Absolut Peppar', 25, NULL, 3200.00),
        (27, '700ml Absolut Peppar', 25, NULL, 3200.00),
        (28, 'Absolut Juice', 22, NULL, 5200.00),
        (29, 'Smirnoff', 21, NULL, NULL),
        (30, 'Smirnoff Ice', 29, NULL, 2900.00),
        (31, 'Smirnoff Flavored Vodkas', 29, NULL, 3100.00),
        (32, 'Gin', NULL, NULL, NULL),
        (33, 'Tanqueray London Dry Gin', 32, NULL, 3000.00),
        (34, 'Hendrick Gin', 32, NULL, 5000.00),
        (35, 'Bombay Sapphire Gin', 32, NULL, 4200.00),
        (36, 'Beefeater Gin', 32, NULL, 6800.00),
        (37, 'Aviation Gin', 32, NULL, 3280.00),
        (38, 'Rum', NULL, NULL, NULL),
        (39, 'Bacardi', 38, NULL, NULL),
        (40, 'Bacardi Superior', 39, NULL, 3000.00),
        (41, 'Bacardi Gold', 39, NULL, 3200.00),
        (42, 'Bacardi 8', 39, NULL, 3180.00),
        (43, 'Bacardi Limon', 39, NULL, 3800.00)

To establish a child-parent hierarchy in MSSQL, the process typically involves creating a table, specifying the foreign key constraint, and populating the table with data to construct the hierarchical relationships.

Update HID column in the table:

Subsequently, updating the HID column becomes necessary. The HID plays a crucial role in calculating the subtotal within the Child-Parent Hierarchy table.

    
---- HID Update ---
WITH HierarchyCTE AS (
    --- Root nodes:
    SELECT  NodeID          as 'NodeID',
            ParentID        as 'ParentID',
            CAST('/' + CAST(NodeID AS VARCHAR(10)) + '/' AS VARCHAR(MAX)) as 'h',
            HID             as 'HID'			
    FROM Hierarchy
    WHERE ParentID IS NULL
        
    UNION ALL
        
    --- Recursion:
    SELECT  H.NodeID        as 'NodeID', 
            H.ParentID      as 'ParentID',
            CAST(C.h + CAST(H.NodeID AS VARCHAR(10)) + '/' AS VARCHAR(MAX)) as 'h',
            H.HID           as 'HID'
            FROM Hierarchy H
            INNER JOIN HierarchyCTE C ON H.ParentID = C.NodeID
)
        
---- SELECT * FROM HierarchyCTE
        
UPDATE [Hierarchy]
SET HID = T1.h
FROM HierarchyCTE T1
JOIN Hierarchy T2 ON T1.NodeID = T2.NodeID

05 Querying data from the table using CTEs

If you need to retrieve data from a table that has a child-parent hierarchy in MSSQL, one way to do so is by using Common Table Expressions (CTEs) and recursive queries. This will allow you to traverse the hierarchy and retrieve information about nodes and their relationships.

Let's say you have a Hierarchy table with the following data:

To begin, establish the CTE query that outlines the initial scenario for the recursion process. This query selects the root nodes by filtering for all nodes with ParentID assigned as NULL, indicating that they are the top-level nodes in the hierarchy. Additionally, the CTE includes a "Level" column, which will keep track of each node's position in the hierarchy. The "SortColumn" is a column that assigns a row number based on the partition of ParentID and ordered by NodeID. "SortColumn" is then utilized in the ORDER BY statement to determine the row position by sorting the values in ascending order.

                
WITH HierarchyCTE AS (
    --- Root nodes:
    SELECT  NodeID      as 'NodeID',
            NodeName    as 'NodeName',
            ParentID    as 'ParentID',
            HID         as 'HID',
            Sales       as 'Sales',
            0           as 'Level',
            CAST(STR(ROW_NUMBER() OVER (PARTITION BY ParentID ORDER BY NodeID), 4, 0) AS varchar(1024)) as 'SortColumn'
    FROM Hierarchy
    WHERE ParentID IS NULL
)

SELECT * FROM HierarchyCTE ORDER BY SortColumn ASC

The recursive case for the CTE query involves selecting all child nodes from the current level and joining them to their parent nodes from the previous level.

This is done by selecting all nodes from the current level (HierarchyCTE) and their corresponding child nodes from the Hierarchy table, based on the ParentID and NodeID columns. The Level column is incremented by 1 for each child node.

To execute the query, we need to combine the base case and the recursive case in a single SELECT statement. This will allow us to retrieve all nodes in the hierarchy.


WITH HierarchyCTE AS (
    --- Root nodes:
    SELECT  NodeID          as 'NodeID',
            NodeName        as 'NodeName',
            ParentID        as 'ParentID',
            HID             as 'HID',
            Sales           as 'Sales',
            0               as 'Level',
            CAST(STR(ROW_NUMBER() OVER (PARTITION BY ParentID ORDER BY NodeID), 4, 0) AS varchar(1024)) AS 'SortColumn'
    FROM Hierarchy
    WHERE ParentID IS NULL
    
    UNION ALL

    --- Recursion:
    SELECT  H.NodeID        as 'NodeID', 
            H.NodeName      as 'NodeName',
            H.ParentID      as 'ParentID',
            H.HID           as 'HID',
            H.Sales         as 'Sales',
            C.[Level] + 1   as 'Level',
            CAST(C.SortColumn + STR(ROW_NUMBER() OVER (PARTITION BY H.ParentID ORDER BY H.NodeID), 4, 0) AS varchar(1024)) AS 'SortColumn'
    FROM Hierarchy H
    INNER JOIN HierarchyCTE C ON H.ParentID = C.NodeID
)
                    
SELECT * FROM HierarchyCTE ORDER BY SortColumn ASC

The single CTE query integrates both the base case and recursive case, retrieving all nodes and their corresponding relationships within the hierarchy.

Indented NodeNames:

The query "REPLICATE(' ', C1.[Level]) + C1.NodeName" is utilized to apply an indentation to the NodeName based on its hierarchical position.

Querying of subtotal each nodes:

As mentioned in the HID column update, the total of each node's subtotal calculation will be provided to you.


SELECT  H1.NodeID           as 'NodeID',
        SUM(H2.Sales)       as 'TotalSales',
        COUNT(H1.NodeID)    as 'NodeCount'
FROM Hierarchy H1
JOIN Hierarchy H2 ON H2.HID.IsDescendantOf(H1.HID) = 1
GROUP BY H1.NodeID

In MSSQL, utilizing CTEs and recursive queries can enable you to effectively inquire and navigate through a table with a child-parent hierarchy. This allows you to fetch the nodes and their associations while also calculating subtotals for each node. The following is the complete query:

                
WITH HierarchyCTE AS (
    
    --- Root nodes:
    SELECT  NodeID          as 'NodeID',
            NodeName        as 'NodeName',
            ParentID        as 'ParentID',
            HID             as 'HID',
            Sales           as 'Sales',
            0               as 'Level',
            CAST(STR(ROW_NUMBER() OVER (PARTITION BY ParentID ORDER BY NodeID), 4, 0) AS varchar(1024)) as 'SortColumn'
    FROM Hierarchy
    WHERE ParentID IS NULL

    UNION ALL

    --- Recursion:
    SELECT  H.NodeID        as 'NodeID', 
            H.NodeName      as 'NodeName',
            H.ParentID      as 'ParentID',
            H.HID           as 'HID',
            H.Sales         as 'Sales',
            C.[Level] + 1   as 'Level',
            CAST(C.SortColumn + STR(ROW_NUMBER() OVER (PARTITION BY H.ParentID ORDER BY H.NodeID), 4, 0) AS varchar(1024)) as 'SortColumn'
    FROM Hierarchy H
    INNER JOIN HierarchyCTE C ON H.ParentID = C.NodeID
)

---- CTE Table ---
SELECT	C1.NodeID           as 'NodeID',
        C1.NodeName         as 'NodeName',		
        C1.ParentID         as 'ParentID',
        C1.Sales            as 'Sales',
        C1.HID              as 'HID',
        C1.[Level]          as 'Level',
        C1.SortColumn       as 'SortColumn',
        -- Indented NodeName --
        REPLICATE('    ', C1.[Level]) + C1.NodeName as 'IndentedNodeName',
        HidSum.TotalSales   as 'TotalSales',
        HidSum.NodeCount    as 'NodeCount'
FROM HierarchyCTE C1
                
LEFT JOIN ( SELECT  H1.NodeID           as 'NodeID',
                    SUM(H2.Sales)       as 'TotalSales',
                    COUNT(H1.NodeID)    as 'NodeCount'
            FROM Hierarchy H1
            JOIN Hierarchy H2 ON H2.HID.IsDescendantOf(H1.HID) = 1
            GROUP BY H1.NodeID) HidSum ON C1.NodeID = HidSum.NodeID
                
ORDER BY C1.SortColumn ASC

06 Updating and deleting data in the child-parent hierarchy

When dealing with hierarchical data of child-parent relationships in MSSQL, updating and deleting data can pose a greater level of complexity compared to handling flat data. It is important to take note of the following considerations:

Updating data:

To update a node's ParentID, ensure that the new parent node is not a child of the current node. This is crucial to avoid circular references and potential errors. Additionally, updating the HID column is necessary to ensure accurate subtotal calculations. A CTE query can be used to verify if the new parent is a descendant of the current node before updating, which can prevent any issues.
To ensure that the updated NodeName is unique within its siblings, it's important to verify that no other sibling nodes share the same name within the hierarchy. One way to do this is by utilizing a CTE query to check for any potential name conflicts before proceeding with the update.

Deleting data:

To ensure the hierarchy's integrity, it's essential to remove all descendants of a node when deleting it. One approach to accomplish this is by utilizing a recursive Common Table Expression (CTE) query to identify and delete all descendants of the node.
In case a node has child nodes, you have two options: either delete all of them, including the parent node, or assign them to a new parent node. However, if you choose to reassign the children, you must ensure that the new parent node is not a descendant of the current node to avoid creating a circular reference, which could lead to errors. To prevent this issue, you can utilize a CTE query to verify the new parent node before proceeding with the update.

07 Effective techniques for structuring your data using a parent-child hierarchy

Below are some recommended methods for structuring your data using the Child Parent Hierarchy feature in MSSQL:

Choose the appropriate hierarchy model:

There exist two primary approaches for structuring hierarchical data: the adjacency list model and the nested set model. The adjacency list model employs a straightforward parent-child relationship and is relatively uncomplicated to execute, whereas the nested set model represents each node as a range and is optimized for querying. To select the most appropriate model for your data and application needs, evaluate them against your requirements.

Use proper data types:

It's important to employ suitable data types for the columns in your hierarchy. For instance, IDs should be assigned integer types, while timestamps should use date data types. By doing so, you can maintain data consistency and enhance performance.

Enforce data integrity:

To ensure the integrity of your hierarchy data and prevent the insertion of invalid data, it's recommended to utilize foreign key constraints and check constraints. This approach guarantees that the hierarchy will remain valid over time.

Use indexes:

To enhance performance, generate indexes on the columns that are regularly used in queries. For instance, generating an index on the ParentID column can quicken queries that retrieve all children of a parent node.

Use CTE queries:

CTEs (Common Table Expressions) can be a potent resource when dealing with hierarchical data in MSSQL. By utilizing CTE queries, it is possible to navigate the hierarchy and efficiently obtain the necessary information.

Use stored procedures:

One way to simplify your application code and ensure correct updates to the hierarchy is to develop stored procedures for common tasks, including the insertion, updating, and deletion of nodes.

Optimize for performance:

When dealing with extensive hierarchies, efficiency can become a concern. To enhance performance, employ strategies such as caching and lazy loading. Additionally, contemplate utilizing materialized paths or views to precompute frequently accessed queries.

Plan for growth:

As the size of your hierarchy increases, it might be essential to divide the data or implement alternative methods to enhance efficiency. Anticipate future expansion and ensure that your hierarchy has the capability to adapt and fulfill the demands of your application.

08 Next steps for implementing a child-parent hierarchy and subtotals, and concluding remarks

To summarize, MSSQL's Child Parent Hierarchy is a versatile and robust method for structuring hierarchical data. It simplifies navigation and querying of the hierarchy and offers a variety of tools and strategies for maintaining and improving your data.

If you're considering incorporating Child Parent Hierarchy into your database, the following are the next steps to take:

the suitable hierarchy model that matches your application and data needs.
Construct a table with the appropriate columns and data types to support the hierarchy.
Fill the table with hierarchical data.
Implement CTE queries and other strategies to navigate and retrieve information from the hierarchy.
Manage and update the hierarchy using stored procedures.
Enhance performance by utilizing techniques such as indexing and caching.
Plan ahead for expansion and scalability as your system evolves.

Make sure to adhere to recommended methodologies, such as implementing measures to ensure data accuracy, utilizing appropriate data formats, and enhancing performance efficiency. By incorporating these approaches, you can establish a strong and effective hierarchy capable of managing all of your hierarchical data requirements.