Hierarchical Data: Applying Data Structures

In my previous post I walked through persisting hierarchical data using closure tables.  While closure tables are a great technique for persistence we need a more approachable way to deal with the hierarchy in the application layer.  Since we are dealing with a hierarchy it stands to reason that we could benefit from a tree-like data structure and that’s just what we’re going to construct.

    //I tend to favor interfaces over abstract classes
    //However, this is definitely one of those places where
    //either/both make sense. The point of the node is to
    //represent a single record within the closure table
    //This is where the abstraction begins
    public interface IHierarchyNode
    {
        //Here are the three fields from the closure table
        Int32 ParentID { get; set; }
        Int32 ChildID { get; set; }
        Int32 PathLength { get; set; }

        //Since we are not limiting the width of the hierarchy
        //at any branch we don't have a typical "left child"
        //"right child" approach instead we prepare for n children
        IEnumerable Children { get; set; }

        //Useful convenience method
        Boolean HasChildren();
    }

The node itself is pretty straight forward.  My first implementation of the HierarchyNode used generics and was quite flexible.  Unfortunately, the solution I was working in used Entity Framework 4.1 Code First(not the unfortunate part) and as it turns out EF 4.1 doesn’t support building models which implement generics (the unfortunate part).  So, I backed up to a slightly cumbersome approach.  The takeaway here is that it doesn’t really matter what patterns you apply to this scenario, at the end of the day we just need to get a closure record out of persistence and represented in the application.

    public class FamilyMemberNode : IHierarchyNode
    {
        //Implementation of IHierarchyNode - nothing to see here
        public Int32 ParentID { get; set; }
        public Int32 ChildID { get; set; }
        public Int32 PathLength { get; set; }

        public IEnumerable Children { get; set; }

        //The major difference here is that our FamilyMemberNode
        //carries a payload of a FamilyMember.  Again, we could
        //avoid concrete classes of the node and use generics
        //or other techniques, but lets not get hung up on that
        //conversation
        public FamilyMember FamilyMember { get; set; }

        //Covers our basis for children. There is a very compelling
        //argument to move this method to an abstract class to cut
        //down on DRY violation.
        public bool HasChildren()
        {
            if (Children == null)
                return false;

            if (!Children.Any())
                return false;

            return true;
        }

        //Providing a way to compare two nodes is invaluable
        //especially for unit testing and tree traversal.
        //Another candidate for abstract class
        public override Boolean Equals(object obj)
        {
            var node = (IHierarchyNode)obj;

            //Two nodes are equaly if all three of these fields are identical
            return (ParentID == node.ParentID &&
                    ChildID == node.ChildID &&
                    PathLength == node.PathLength);
        }
    }

With the nodes out of the way, we can move on to building the hierarchy and providing an abstraction from the closure table. Prior to digging in, it may be worth having a look at the closure table results to follow along with some of the logic here. The nomenclature becomes a little challenging to navigate here. At one point I refer to the ROOT of the entire hierarchy, and other times I mean a root/parent/node to be a record in the closure table where the ParentID = ChildID and PathLength = 0. Hopefully I’ve landed on something that gets the idea across.

    public class Hierarchy where T : class, IHierarchyNode
    {
        //Think of this as the "Genetic Code" of the hierarchy, we hold
        //the metadata about the structure here.  Effectively, we hold
        //the closure table records and can query these nodes just as
        //we did in SQL against the closure table
        private readonly IEnumerable _unstructuredNodes;

        //Top most node of the entire data structure.
        private T _root;

        public Hierarchy(IEnumerable nodes)
        {
            _unstructuredNodes = nodes;

            //Builds the entire hierarchy when instantiated
            Build();
        }

        ///<summary> 
        /// Retrieves the root of a given node. This root may not be the root of the entire
        /// hierarchy, but instead is logically equivalent to the parent of the given node.
        /// Since the underlying structure of nodes may suggest there is more than one parent
        /// this method finds the actual root (Where PathLength = 0)
        /// </summary>
        public T GetRoot(Int32 nodeID)
        {
            var root = _unstructuredNodes.Single(x => x.ParentID == nodeID && x.PathLength == 0);
            return root;
        }

        ///<summary>
        /// Returns the root HierarchyNode of the entire Hierarchy structure.
        /// </summary>
        public T GetRoot()
        {
            //If we already know the root, don't figure it out again
            if (_root == null)
            {
                //The logic behind selecting the root is as follows:
                //Root(or parent) nodes are those where PathLength == 0
                //The Root of the entire structure is the parent node that is not the child of
                //any other parent nodes.  Below, the &&(AND) clause is executing the statement
                //"where the parent node is not a child of any other node"
                var root = _unstructuredNodes.Single(x => x.PathLength == 0
                                            && !_unstructuredNodes
                                                .Where(t => t.PathLength != 0).ToList()
                                                .ConvertAll(t => t.ChildID)
                                                .Contains(x.ParentID));

                return root;
            }

            return _root;
        }

        //Builds the entire hierarchy starting from the top-most root
        private void Build()
        {
            _root = GetRoot();
            _root.Children = BuildChildNodes(_root);
        }

        ///<summary>
        /// Recursively constructs the child nodes from any given node down.
        /// That is, given any node, this routine will construct the entire
        /// branch of the hierarchy starting with provided node as its root
        /// </summary>
        public List BuildChildNodes(T node)
        {
            //These are the children that will be constructed and
            //ultimately become the given node's Children field
            var parentNodes = new List();

            //These are the immediate children of the given node, but are not "Root" nodes themselves,
            //that is, the root node is such that PathLength = 0 and subsequently ParentID == ChildID
            var children = _unstructuredNodes.Where(x => x.ParentID == node.ParentID && 
                                                         x.PathLength == 1).ToList();

            //We iterate through these children to construct the appropriate set of nodes that will
            //become the given node's Children
            foreach (var child in children)
            {
                //Get the root node of this child
                var parentNode = GetRoot(child.ChildID);
                parentNodes.Add(parentNode);

                //RECURSIVE CASE:
                //Now we go find any children of this node until we find a leaf
                BuildChildNodes(parentNode);
            }

            //BASE CASE:
            //When this point is reached, we either have no more children to iterate through
            //or a leaf was found and children.Count() == 0, so return given node with its children
            node.Children = parentNodes;
            return parentNodes;
        }
    }

The data structure above is certainly not complete but I’ve accomplished what I set out to do – abstract the particulars of the closure table behind a data structure within the application. At this point, the rest of the functionality is left up to individual style. For example, traversing, searching and CRUD operations could be implemented with a fluent interface, or a set of extension methods on the base Hierarchy class without actually pushing the operations into the base class itself. I may offer an approach in a future post after coding with it a bit more.

My objective with the Hierarchy data structure above achieves the abstraction I was looking for, and in the next post we’ll work with the hierarchy by rendering a UI. However, I want to touch on a few points before moving on. The hierarchy above has some definite scaling issues. When I created this data structure, I was anticipating a reasonably small size hierarchy of approx. 100-200 nodes in the hierarchy and a bit over 1000 unstructured nodes (entries in the closure table).

The first pressure point for scaling is my use of recursion, and we are all likely aware of recursion’s performance woes. Should you find the hierarchy performing poorly perhaps re-implement the construction of the hierarchy with an iterative approach. The second pressure point for scaling is holding on to the unstructured nodes (all closure table records). It may be convenient in the short term to query the list via linq like we would the closure table using SQL. However, keep in mind that the hierarchy itself only holds a relatively small percentage of the unstructured nodes from the closure table. Releasing the list of unstructured nodes may greatly reduce the memory footprint should you find yourself working with enormous hierarchies.

Hopefully this helps get your head around working with the closure table and serves as an alternative to heavy stored procedures in the persistence layer.

Feel free to download and use any of the code from this sample project: DaHierarchy

Advertisements

2 Responses to “Hierarchical Data: Applying Data Structures”

  1. Hey, great article. Bookmarked the site 😉

Trackbacks/Pingbacks

  1. Hierarchical Data: Rendering with Razor « The B^2 Brand - November 20, 2011

    […] walked through persisting hierarchical data via closure tables and then through abstracting the closure table in the application layer.  Now, we’ll look at rendering our hierarchy in an ASP.NET MVC 3 sample application using […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: