Kahn’s Algorithm for Topological Sorting*

*The content is intended for self-study and sharing, the original blog is: INTERVIEW KICKSTART - Kahn’s Algorithm for Topological Sorting.

In graph theory, topological sorting, often referred to as topological ordering, is primarily employed to determine the sequence of nodes in a directed acyclic graph (DAG). In a DAG, vertices have interdependencies due to the existance of directed edges. For instance, given two vertices U and V, a directed edge (U, V) implies that V can be reached from U, while the reverse is not true - U cannot be reached from V. In this context, V is regarded to have a dependency on U. The topological order is essentially a linear arrangement of all the vertices within a DAG. It provides a systematic depiction of feasible sequences originating from certain vertices and concluding at others.

Now let's delve into some fundamental yet crucial concepts regarding a DAG:

  1. A DAG contains at least one vertex with the indegree zero.
  2. A DAG contains at least one vertex with the outdegree zero.
These two characteristics are derived from the term "acyclic" - signifying the absence of loops within the DAG. Consequently, there is always at least one origin vertex and one ending vertex. It's important to note that a DAG can comprise multiple disconnected components, meaning that there can be separate subgraphs within the graph that are not directly connected to each other. Each of these components is still a DAG, as long as they satisfy the two properties mentioned above.

Let us see an example:
alternatetext

Directed edges: {0, 1}, {0, 2}, {1, 3}, {1, 4}, {3, 4}, {3, 5}
"in" represents "indegree"
Order of removal: 0, 1, 2, 3, 4, 5
A possible topological order: 0, 1, 2, 3, 4, 5
Note that the sequence is not the only possible topological order. For instance, 0, 2, 1, 3, 4, 5 is also a valid topological order.
The process of Kahn's algorithm:
Step 1

Step 1: remove node 0 (with indegree = 0) and its outward edges. Update the indegree of the deleted edges’ destination nodes (1 and 2).

Step 2

Step 2: remove node 1 and 2, alongside the outward edges. Update the indegree of node 3 and 4.

Step 3

Step 3: remove node 3 and its outward edges. Update the indegree of node 4 and 5.

Step 4

Step 4: remove node 4 and 5.

Now, with these considerations in mind, let's delve into the implementation details. To begin, we will need to summarize specific steps in order to write pseudocode and then proceed with the actual implementation.
Given a DAG, denoted as G and represented with an adjacency list, the algorithm can be summarized as follows:
  • Calculate the indegree for all nodes in G.
  • Identify a node with an indegree of 0, indicating no incoming edges.
  • Remove the identified node from G and add it to the topological ordering.
  • Remove the outgoing edges from the removed node in G.
  • Decrement the indegree of the connected destination nodes (those connected by the edges removed in the previous step).
  • Repeat these steps until no nodes are left with zero indegree, signifying that either all nodes have been traversed or a cycle has been encountered.
After completing these steps, check if all elements are present in the sorted order. If this is the case, a valid topological ordering has been obtained. Otherwise, no topological ordering is possible.
 
// pseudo code for an example

Kahn_topological_sort(Graph G)
{	
	For each node in G{
		Find and populate indegree[node]; // get the indegree for all the nodes in G
	}

	Create a stack or queue; // auxiliary data structure - as a container

	Add nodes with indegree 0 to the queue;

	while(node U with indegree 0 exists in the queue){
		Remove U from the queue;
		Remove all its outgoing edges (U, Vi) from G;

		For destination vertices Vi which had their incoming edges removed{
			indegree[Vi] -= 1;
		}

		Add new nodes with indegree 0 to the queue;
	}

	// determine if a topological ordering has been found
	if(elements sorted = all elements)
		return true or print nodes in topologically sorted order;
	else 
		return false or prompt that no topological ordering exists (e.g. there exists rings in G)

}

Now let's implement it using C++, the very first thing is to design a data structure to store a graph.
For the purpose of simplfying the code, we might as well consider representing the graph nodes using integers, numbering from 0 to a specific value. We can then use these integers as array indices. Consider the following example:

node_array[0] = {1, 2};
In this representation, it signifies that there are two directed edges: 0 -> 1, 0 -> 2
To store our graph, we can make use of a 2D array, often referred to as an adjacencyList

vector<vector<int>> adjacencyList;
As illustrated above, each element in the adjacencyList contains a 1D array, representing the destination nodes connected by directed edges originating from the current node. For instance, adjacencyList[0] represents a 1D array containing the nodes incident to node 0.
The overall code is as follows:

#include <iostream>
#include <vector>
#include <queue>
using std::vector;
using std::queue;
		
// Graph class - store the input graph
		
class Graph{
public:
	Graph(int NumberOfNodes) : m_numberofNodes(NumberOfNodes){
		m_adjacencyList.resize(NumberOfNodes);
		m_inDegree.resize(NumberOfNodes, 0);	    
	}
			
	void add_directed_edges(int src, int dest){
		m_adjacencyList[src].emplace_back(dest);
	}

	// main function: topological sort
	bool topological_sort(vector<int>& sorted)
	{
	    auto indegree_copy = m_inDegree; // save a copy of the indegree vector
	    
	    // store all the nodes with no incoming edges
	    queue nodes_with_zero_indegree;
	    for(int i = 0; i < m_numberofNodes; ++i)
	    {
	        if(m_inDegree[i] == 0)nodes_with_zero_indegree.push(i);
	    }
	    
	    // process - while the queue is not empty
	    while(!nodes_with_zero_indegree.empty())
	    {
	        auto current_zero_node = nodes_with_zero_indegree.front();
	        nodes_with_zero_indegree.pop(); // delete the node with zero indegree
	        sorted.emplace_back(current_zero_node); // add it to the sorted list
	        
	        // alter the graph: delete the incident edges and update the indegree
	        // of the incident nodes connected to current_zero_node
	        for(auto const& dest_node : m_adjacencyList[current_zero_node])
	        {
	            // delete the directed edge: current_zero_node -> dest_node
	            // meaning the indegree of the dest node decrements by 1
	            --m_inDegree[dest_node];
	            
	            // if the updated indegree of dest node is now zero
	            // add it to the nodes_with_zero_indegree queue
	            if(m_inDegree[dest_node] == 0) nodes_with_zero_indegree.push(dest_node);
	        } // end for
	        
	    } // end while: while the queue is not empty
	    
	    // a cycle was encountered - if the sorting is done but not all input nodes have been sorted
	    if((int)sorted.size() != m_numberofNodes)return false;
	    
	    // successfully sorted
	    return true; 
	    
	}
			
	void print_graph() {
		for(int i = 0; i < m_adjacencyList.size(); ++i)
		{
			if(m_adjacencyList[i].size() != 0)
			{   
				std::cout << "Node " << i << " -> ";
				for(const auto& node : m_adjacencyList[i])
				{
					std::cout << node << " ";
				}
				std::cout << std::endl;
			}
		}
	}   
protected:
	int m_numberofNodes;
	vector<vector<int>> m_adjacencyList;
    vector<int> m_inDegree;   
};
		
int main() {
    int n = 6;
    
    Graph graph(n);
    graph.add_directed_edges(0, 1);
    graph.add_directed_edges(0, 2);
    graph.add_directed_edges(1, 2);
    graph.add_directed_edges(2, 3);
    graph.add_directed_edges(1, 4);
    graph.add_directed_edges(5, 3);
    graph.add_directed_edges(4, 5);
    
    graph.print_graph();
    
    vector<int> sorted;
    sorted.reserve(n);
    
    if(graph.topological_sort(sorted)){
        std::cout << "sorted done, the topological ordering is: " << std::endl;
        for(auto const& node : sorted)std::cout << node << " ";
        std::cout << std::endl;
    }else{
        std::cout << "topological sorting is not possible as the graph is not acyclic" << std::endl;
    }

	return 0;
}