Recently, I’ve been working with hierarchical data using closure tables. The problem this technique solves for me is persisting hierarchical relationships of specific entities without any restriction on the size of the hierarchy and while providing a simple way to query the hierarchy. Closure tables, above other solutions like path enumeration, maintain referential integrity.
In a series of posts about storing hierarchical data I’m going to walk through implementing a solution for working with hierarchical data at the persistence layer, the application layer, and the presentation layer. We’ll see what is necessary to persist the hierarchy in SQL Server 2008, how we wrap the closure tables in the application, and how we build hierarchy in the user interface with MVC 3
First, let’s dig right in and look at what’s necessary to set-up the closure table. In the image below you’ll see just what little is necessary to get going. The table titled “Closure” is the closure table where the metadata about the structure of the hierarchy is stored. The ParentID is the foreign key to the FamilyMemberID and the ChildID is the FamilyMemberID of one of the decedents. The PathLength indicates how far removed from the node the “Child” is. So, if we are looking at a grandfather, then themself would be PathLength = 0, their child would be PathLength = 1, and their grandchild would be PathLength = 2.
You may have also noticed that the FamilyMembers table also holds a ParentID. This makes things convenient, but is not completely necessary. To understand more about why the ParentID is present, let’s look at the trigger “FamilyMemberClosureTrigger”. Everyone’s comfort level with database triggers is different but to me this is one of those appropriate cases for one. In this case, the trigger is responsible for creating the metadata entries for the Closure table each time a record is inserted into FamilyMembers (or updated, deleted, etc). This isn’t absolutely necessary, though. You could definitely move this logic elsewhere and do away with the trigger altogether.
--Here, we create the trigger when an INSERT takes place in FamilyMembers table. --Update and Delete are still necessary but left them out for this demo CREATE TRIGGER [dbo].[FamilyMemberClosureTrigger] ON [dbo].[FamilyMembers] FOR INSERT AS DECLARE @MemberID INT; DECLARE @ParentID INT; --Get the FamilyMemberID of the newly inserted record SELECT @MemberID = i.FamilyMemberID FROM INSERTED i; --Get the ID of the new record's parent SELECT @ParentID = i.ParentID FROM INSERTED i; --First, we insert the record for the newly inserted FamilyMember --This is a self-referencing record, meaning the ParentID = ChildID --And PathLength = 0. INSERT INTO Closure(ParentID, ChildID,PathLength) VALUES(@MemberID, @MemberID, 0) --Next, we insert metadata about how this new family member --is related to the rest of the hierarchy. INSERT INTO Closure(ParentID, ChildID, PathLength) SELECT p.ParentID, c.ChildID, p.PathLength + c.PathLength+1 FROM Closure p, Closure c WHERE p.ChildID = @ParentId AND c.ChildID = @MemberID
At this point, we’re all set-up for new FamilyMembers to be inserted into the FamilyMembers table. Let’s take a look at the data when these tables are populated. Below, you can see the Closure Table (left) and the FamilyMembers table. Take a moment to observe just what the Closure Table is storing for us.
The Closure Table, by storing this metadata about the FamilyMembers enables us to directly query the relationships in the hierarchy pretty quickly. That is, we can ask questions like “Who are all of the grandchildren of John Doe”
SELECT * FROM Closure c WHERE c.ParentID = 1 AND PathLength = 2
Which results in:
As you can see, this makes things relatively simple for querying. Though, there are some downsides to this solution. We can see the size of the closure table will grow exponentially faster than the entity table (FamilyMembers). Also, the closure table is not simple to “debug”, that is if the table becomes corrupt i.e. failed writes, etc, then the table will need to be rebuilt as attempting to track down the broken relationship visually is not impossible but an exercise in tedium. On the other hand, the relationships are pretty simple so rebuilding part or all of the hierarchy isn’t so bad, especially if you also implement the ParentID on the entity table itself – its part of the charm.
In the next post I will be walking through how to build a custom data structure to abstract the application from the closure table and work with the hierarchy throughout the application.