Q. How will you represent a hierarchical structure shown below in a relational database? or How will you store a tree data structure into DB tables?
A.The hierarchical data is an example of the composite design pattern. The entity relationship diagrams (aka ER diagram) are used to represent logical and physical relationships between the database tables. The diagram below shows how the table can be designed to store tree data by maintaining the adjacency information via superior_emp_id.
as you can see the "superior_emp_id" is a foreign key that points to the emp_id in the same table. So, Peter has null as he has no superiors. John and Amanda points to Peter who is their manager or superior and so on.
The above table can be created using SQL DDL (Data Definition Language) as shown below.
CREATE TABLE employee (
emp_id NUMBER (4) CONSTRAINT emp_pk PRIMARY KEY,
emp_name VARCHAR2 (40) NOT NULL,
title VARCHAR2 (40),
dept_id NUMBER (2) NOT NULL,
superior_emp_id NUMBER (4) CONSTRAINT emp_fk REFERENCES employee(emp_id)
CONSTRAINT emp_pk
PRIMARY KEY NONCLUSTERED (emp_id)
)
This can be represented as an object model to map relational data as shown below
public class Employee {
private Long id;
private String name;
private String title;
private Employee superior;
private Set subordinates;
//getters and setters are omitted
}
Q. How will you find out the superior for an emplyee?
A. You can use a self-join to find the manager of an employee
Select e.emp_id,e.emp_name, title from
employee e, employee s
where e.superior_emp_id = s.employee_id
and e.emp_id = 3
This should return
1, Peter, cio
Q. Is there any other way to to store tree structure in a relational database?
A. Yes, it can be done using the "modified preorder tree traversal" as described below.
As shown in the diagram above, each node is marked with a left and right numbers using a modified preorder traversal as shown above. This can be represented in a database table as shown below.
As you can see the numbers indicate the relationship between each node. All left values greater than 6 and right values less than 11 are descendants of 6-11 (i.e Id: 3 Amanda). Now if you want to extract out the 2-6 sub-tree for Amanda you can write the SQL as follows
SELECT * FROM employee WHERE left_val BETWEEN 6 and 11 ORDER BY left_val ASC;
Which will return Amanda, Ralph, and Jeanne.
If you want to get ancestors to a given node say 7-8 Ralph, you can write the SQL as follows
SELECT * FROM employee WHERE left_val < 7 and right_val > 8 WHERE ORDER BY left_val ASC;
Which will return: Peter and Amanda.
If you want to find out the number of descendants for a node, all you need is the left_val and right_val of the node for which you want to find the descendants count. The formula is
No. of descendants = (right_val - left_val -1) /2
So, for 6 -11 Amanda, (11 - 6 - 1) /2 = 2 descendants
for 1-12 Peter, (12 - 1 -1 ) / 2 = 5 descendants.
for 3-4 Mary, (4 -3 - 1) / 2 = 0, means it is a child and has no descendants.
The modified preorder traversal is a little more complicated to understand, but is very useful.