Hierarchical Data: Persistence via Closure Table

Recently, I’ve been working with hierarchical data using closure tables.  The problem this technique solves for me is persisting hierarchical relationships of specific entities without any restriction on the size of the hierarchy and while providing a simple way to query the hierarchy.  Closure tables, above other solutions like path enumeration, maintain referential integrity.

In a series of posts about storing hierarchical data I’m going to walk through implementing a solution for working with hierarchical data at the persistence layer, the application layer, and the presentation layer.  We’ll see what is necessary to persist the hierarchy in SQL Server 2008, how we wrap the closure tables in the application, and how we build hierarchy in the user interface with MVC 3

First, let’s dig right in and look at what’s necessary to set-up the closure table.  In the image below you’ll see just what little is necessary to get going.  The table titled “Closure” is the closure table where the metadata about the structure of the hierarchy is stored.  The ParentID is the foreign key to the FamilyMemberID and the ChildID is the FamilyMemberID of one of the decedents.  The PathLength indicates how far removed from the node the “Child” is.  So, if we are looking at a grandfather, then themself would be PathLength = 0, their child would be PathLength = 1, and their grandchild would be PathLength = 2.

You may have also noticed that the FamilyMembers table also holds a ParentID.  This makes things convenient, but is not completely necessary.  To understand more about why the ParentID is present, let’s look at the trigger “FamilyMemberClosureTrigger”.  Everyone’s comfort level with database triggers is different but to me this is one of those appropriate cases for one.  In this case, the trigger is responsible for creating the metadata entries for the Closure table each time a record is inserted into FamilyMembers (or updated, deleted, etc).  This isn’t absolutely necessary, though.  You could definitely move this logic elsewhere and do away with the trigger altogether.

--Here, we create the trigger when an INSERT takes place in FamilyMembers table.
--Update and Delete are still necessary but left them out for this demo
CREATE TRIGGER [dbo].[FamilyMemberClosureTrigger] ON [dbo].[FamilyMembers]

--Get the FamilyMemberID of the newly inserted record
SELECT @MemberID = i.FamilyMemberID FROM INSERTED i;
--Get the ID of the new record's parent

--First, we insert the record for the newly inserted FamilyMember
--This is a self-referencing record, meaning the ParentID = ChildID
--And PathLength = 0.
INSERT INTO Closure(ParentID, ChildID,PathLength)
 VALUES(@MemberID, @MemberID, 0)
--Next, we insert metadata about how this new family member
--is related to the rest of the hierarchy.
INSERT INTO Closure(ParentID, ChildID, PathLength)
SELECT p.ParentID, c.ChildID, p.PathLength + c.PathLength+1
 FROM Closure p, Closure c
WHERE p.ChildID = @ParentId
AND c.ChildID = @MemberID

At this point, we’re all set-up for new FamilyMembers to be inserted into the FamilyMembers table. Let’s take a look at the data when these tables are populated.  Below, you can see the Closure Table (left) and the FamilyMembers table.  Take a moment to observe just what the Closure Table is storing for us.

ParentID ChildID PathLength
1 1 0
1 2 1
1 3 1
1 4 2
1 5 1
1 6 2
1 7 2
2 2 0
2 4 1
3 3 0
3 6 1
3 7 1
4 4 0
5 5 0
6 6 0
7 7 0
FamilyMemberID ParentID Name
1 0 John Doe
2 0 Billy Doe
3 1 Mary Doe
4 2 Bobby
5 1 Bobby Doe
6 3 Sammy Doe
7 3 Mae Doe

The Closure Table, by storing this metadata about the FamilyMembers enables us to directly query the relationships in the hierarchy pretty quickly.  That is, we can ask questions like “Who are all of the grandchildren of John Doe”

SELECT * FROM Closure c
WHERE c.ParentID = 1
AND PathLength = 2

Which results in:

ParentID ChildID PathLength
1 4 2
1 6 2
1 7 2

As you can see, this makes things relatively simple for querying. Though, there are some downsides to this solution. We can see the size of the closure table will grow exponentially faster than the entity table (FamilyMembers). Also, the closure table is not simple to “debug”, that is if the table becomes corrupt i.e. failed writes, etc, then the table will need to be rebuilt as attempting to track down the broken relationship visually is not impossible but an exercise in tedium. On the other hand, the relationships are pretty simple so rebuilding part or all of the hierarchy isn’t so bad, especially if you also implement the ParentID on the entity table itself – its part of the charm.

In the next post I will be walking through how to build a custom data structure to abstract the application from the closure table and work with the hierarchy throughout the application.


11 Responses to “Hierarchical Data: Persistence via Closure Table”

  1. You’ve got a pretty amazing answer there. I wonder if you could help me with my dilemma: http://stackoverflow.com/questions/11790108/what-table-structure-to-use-for-nested-data

    It’s basically a hierarchical structure like the one you describe but a bit more complex. In your analogy of the children/parents, let’s say I can move children around.

    Let’s say that Sammy Doe (can magically) become older than May Doe and thus be the first great grandchild, how would you handle that? I can imagine adding a position column to get that out of the way.

    Okay but how would you, for example, call up the 2nd great grandchild of the 1st grandchild of the 2nd child of the grandfather? or how about only the grandchildren of Billy Doe where only Mary Doe is the parent?

    I’d greatly appreciate your help 🙂

    • Great question Antonin, that does sound like some extra complexity. Have you considered reorganizing the tree? Recently I’ve used closure tables in an architecture which all “nodes” in the tree are able to be rearranged and also exist in multiple places simultaneously. That is, the UI allows for a user to drag something in the hierarchy to a completely new location in the hierarchy.

      I’m going to propose two ways to solve this, the first with the existing schema I’ve presented above, and the second solution adds a layer of abstraction.

      You could handle this by rearranging the relationships of the nodes. So, if you move Sammy Doe above May Doe then I’d go to work in the closure table changing parent-child relationships of the two in order to switch their positions. If you think about it like a linked list, the logic is similar where you’ll want to store the parent of both May Doe and Sammy Doe so you can make changes to the relationships without loosing all the relationships under these nodes, that is all the children of these nodes you’re shuffling around.

      Second, and I think a better solution given your constraints is to create a table which abstract between the family members and the tree itself. Create a table called “FamilyMemberNodes” which simply aliases the FamilyMemberID to a “position” in the tree. Then, your closure table would be be constructed based off your FamillyMemberNodes table, not the FamilyMember table or its keys. Then, when you want to swap two family members in the tree you make the change at the FamilyMemberNodes table and you don’t need to touch the structure of the tree itself.

      Incidentally, its solution two which also allows for “FamilyMembers” to exist in multiple places (Which doesn’t make sense for family members – what a weird thought!) but would make sense for other cases like a shopping cart, etc.

      Anyway – long winded answer but does any of this help?

      • I’m thinking of doing the same. Have that drag ‘n drop UI that allows changing of the nodes. But my main problem is trying to figure out a way to call up those specific nodes (1st of top level, 2nd of mid, 5th of last).

        To see if I understand this right, you’re proposing:

        * changing around the closure table with each update/change by updating the child/parent declarations. This seems a bit complex but I don’t see a way around it.
        * I’m not entirely sure I understand your second explanation. I’m trying to wrap my head around it. It would be an in-between table which would hold position (first child, second child, first grandpa), and the id which corresponds to the id in the family table.

        I guess by implementing both of these, i could make it work. Have the Nodes table declare parent/child relationships, the in-between table declare position, and the main table have the actual data. Whenever I want to change the position of the node, I can use the in-between. If I want to change the parent/child relationship, I’ll have to use the Nodes table and update it as well as the in-between table.
        Thanks for your help. I’m still not 100% sure of how to do this all efficiently and if I’m missing something. :/ especially querying this data.

      • There is a simpler solution here than what’s coming across here, I think. I will attempt to write a new post or two this weekend with the different approach to the closure table which should make my second suggestion more clear. I’ve been meaning to revisit this topic with what I’ve learned anyway. 🙂

  2. You’re awesome. Thank you for all your help! I’ll definitely be following your blog 🙂

    I found this an incredibly intricate matter but I find it extremely useful/necessary to what I’m trying to build 🙂 It seems like no one on reddit or stackoverflow has any good ideas other than breaking this down to individual tables for each level, ditching this altogether because SQL can’t do this (it obviously can), and some other stuff.

  3. Exactly where did you acquire the suggestions to post ““Hierarchical Data: Persistence via Closure
    Table | The B^2 Brand”? I appreciate it -Alta


  1. Hierarchical Data: Applying Data Structures « The B^2 Brand - November 20, 2011

    […] my previous post I walked through persisting hierarchical data using closure tables.  While closure tables are a […]

  2. Hierarchical Data: Rendering with Razor « The B^2 Brand - November 20, 2011

    […] walked through persisting hierarchical data via closure tables and then through abstracting the closure table in the application layer.  Now, we’ll look at […]

  3. The B^2 Brand - August 4, 2012

    […] found the closure table very useful in cases where there is a high degree of interaction with hierarchy data. Recently I […]

  4. Converting the Closure Table from a Weak Entity | The B^2 Brand - August 4, 2012

    […] found the closure table very useful in cases where there is a high degree of interaction with hierarchy data. Recently I […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: