Solution 1: Maintain a pointer to the last node
In this approach a pointer to the last node is maintained, and parent pointers are required.
When inserting, starting at the last node navigate to the node below which a new last node will be inserted. Insert the new node and remember it as the last node. Move it up the heap as needed.
When removing, starting at the last node navigate to the second-to-last node. Remove the original last node and remember the the new last node just found. Move the original last node into the place of the deleted node and then move it up or down the heap as needed.
It is possible to navigate to the mentioned nodes in O(log(n)) time and O(1) space. Here is a description of the algorithms but the code is available below:
For insert: If the last node is a left child, proceed with inserting the new node as the right child of the parent. Otherwise... Start at the last node. Move up as long as the current node is a right child. If the root was not reached, move to the sibling node at the right (which necessarily exists). Then (whether or not the root was reached), move down to the left as long as possible. Proceed by inserting the new node as the left child of the current node.
For remove: If the last node is the root, proceed by removing the root. Otherwise... Start at the last node. Move up as long as the current node is a left child. If the root was not reached, move to the sibling left node (which necessarily exists). Then (whether or not the root was reached), move down to the right as long as possible. We have arrived at the second-to-last node.
However, there are some things to be careful about:
When removing, there are two special cases: when the last node is being removed (unlink the node and change the last node pointer), and when the second-to-last node is being removed (not really special but the possibility must be considered when replacing the deleted node with the last node).
When moving nodes up or down the heap, if the move affects the last node, the last-node pointer must be corrected.
Long ago I have made an implementation of this. In case it helps someone, here is the code. Algorithmically it should be correct (has also been subjected to stress testing with verification), but there is no warranty of course.
Solution 2: Reach the last node from the root
This solution requires maintaining the node count (but not parent pointers or the last node). The last (or second-to-last) node is found by navigating from the root towards it.
Assume the nodes are numbered starting from 1, as per the typical notation for binary heaps. Pick any valid node number and represent it in binary. Ignore the first (most significant) 1 bit. The remaining bits define the path from the root to that node; zero means left and one means right.
For example, to reach node 11 (=1011b), start at the root then go left (0), right (1), right (1).
This algorithm can be used in insert to find where to place the new node (follow the path for node node_count+1), and in remove to find the second-to-last-node (follow the path for node node_count-1).
This approach is used in libuv for timer management; see their implementation of the binary heap.
Usefulness of Pointer-based Binary Heaps
Many answers here and even literature say that an array-based implementation of a binary heap is strictly superior. However I contest that because there are situations where the use of an array is undesirable, typically because the upper size of the array is not known in advance and on-demand reallocations of an array are not deemed acceptable, for example due to latency or possibility of allocation failure.
The fact that libuv (a widely used event loop library) uses a binary heap with pointers only further speaks for this.
It is worth noting that the Linux kernel uses (pointer-based) red-black trees as a priority queue in a few cases, for example for CPU scheduling and timer management (for the same purpose as in libuv). I find it likely that changing these to use a pointer-based binary heap will improve performance.
Hybrid Approach
It is possible to combine Solution 1 and Solution 2 into a hybrid approach which dynamically picks either of the algorithms (for finding the last or second-to-last node), the one with a lower cost, measured in the number of edges that need to be traversed. Assume we want to navigate to node number N, and highest_bit(X)
means the 0-based index of the highest-order bit in N (0 means the LSB).
Note that in the case of a level change the second equation will yield a wrong (too large) result, but in that case traversal from the root is more efficient anyway (for which the estimate is correct) and will be chosen, so there is no need for special handling.
Some CPUs have an instruction for highest_bit
allowing very efficient implementation of these estimates. An alternative approach is to maintain the highest bit as a bit mask and do these calculation with bit masks instead of bit indices. Consider for example that 1 followed by N zeroes squared is equal to 1 followed by 2N zeroes).
In my testing it has turned out that Solution 1 is on average faster than Solution 2, and the hybrid approach appeared to have about the same average performance as Solution 2. Therefore the hybrid approach is only useful if one needs to minimize the worst-case time, which is (twice) better in Solution 2; since Solution 1 will in the worst case traverse the entire height of the tree up and then down.
Code for Solution 1
Note that the traversal code in insert is slightly different from the algorithm described above but still correct.
struct Node {
Node *parent;
Node *link[2];
};
struct Heap {
Node *root;
Node *last;
};
void init (Heap *h)
{
h->root = NULL;
h->last = NULL;
}
void insert (Heap *h, Node *node)
{
// If the heap is empty, insert root node.
if (h->root == NULL) {
h->root = node;
h->last = node;
node->parent = NULL;
node->link[0] = NULL;
node->link[1] = NULL;
return;
}
// We will be finding the node to insert below.
// Start with the current last node and move up as long as the
// parent exists and the current node is its right child.
Node *cur = h->last;
while (cur->parent != NULL && cur == cur->parent->link[1]) {
cur = cur->parent;
}
if (cur->parent != NULL) {
if (cur->parent->link[1] != NULL) {
// The parent has a right child. Attach the new node to
// the leftmost node of the parent's right subtree.
cur = cur->parent->link[1];
while (cur->link[0] != NULL) {
cur = cur->link[0];
}
} else {
// The parent has no right child. This can only happen when
// the last node is a right child. The new node can become
// the right child.
cur = cur->parent;
}
} else {
// We have reached the root. The new node will be at a new level,
// the left child of the current leftmost node.
while (cur->link[0] != NULL) {
cur = cur->link[0];
}
}
// This is the node below which we will insert. It has either no
// children or only a left child.
assert(cur->link[1] == NULL);
// Insert the new node, which becomes the new last node.
h->last = node;
cur->link[cur->link[0] != NULL] = node;
node->parent = cur;
node->link[0] = NULL;
node->link[1] = NULL;
// Restore the heap property.
while (node->parent != NULL && value(node->parent) > value(node)) {
move_one_up(h, node);
}
}
void remove (Heap *h, Node *node)
{
// If this is the only node left, remove it.
if (node->parent == NULL && node->link[0] == NULL && node->link[1] == NULL) {
h->root = NULL;
h->last = NULL;
return;
}
// Locate the node before the last node.
Node *cur = h->last;
while (cur->parent != NULL && cur == cur->parent->link[0]) {
cur = cur->parent;
}
if (cur->parent != NULL) {
assert(cur->parent->link[0] != NULL);
cur = cur->parent->link[0];
}
while (cur->link[1] != NULL) {
cur = cur->link[1];
}
// Disconnect the last node.
assert(h->last->parent != NULL);
h->last->parent->link[h->last == h->last->parent->link[1]] = NULL;
if (node == h->last) {
// Deleting last, set new last.
h->last = cur;
} else {
// Not deleting last, move last to node's place.
Node *srcnode = h->last;
replace_node(h, node, srcnode);
// Set new last unless node=cur; in this case it stays the same.
if (node != cur) {
h->last = cur;
}
// Restore the heap property.
if (srcnode->parent != NULL && value(srcnode) < value(srcnode->parent)) {
do {
move_one_up(h, srcnode);
} while (srcnode->parent != NULL && value(srcnode) < value(srcnode->parent));
} else {
while (srcnode->link[0] != NULL || srcnode->link[1] != NULL) {
bool side = srcnode->link[1] != NULL && value(srcnode->link[0]) >= value(srcnode->link[1]);
if (value(srcnode) > value(srcnode->link[side])) {
move_one_up(h, srcnode->link[side]);
} else {
break;
}
}
}
}
}
Two other functions are used: move_one_up
moves a node one step up in the heap, and replace_node
replaces moves an existing node (srcnode) into the place held by the node being deleted. Both work only by adjusting the links to and from the other nodes, there is no actual moving of data involved. These functions should not be hard to implement, and the mentioned link includes my implementations.