Data structures (part 1) — Binary Heap

Benjamin Udoh
6 min readOct 2, 2024

--

Data structures are fundamental components in computer science and programming that organize and store data in a specific way to facilitate efficient operations such as insertion, deletion, and searching. They provide a means to manage and represent data in various forms, catering to different requirements and scenarios. Understanding data structures is crucial for developing efficient algorithms and optimizing software performance.

The heap data structure

The heap data structure is a binary tree-based data structure that satisfies the heap property. There are two main types of heaps: min heap and max heap.

  1. Min Heap: In a min heap, the parent node is smaller than or equal to its child nodes. This means that the smallest element is always at the root.
  2. Max Heap: In a max heap, the parent node is greater than or equal to its child nodes. This means that the largest element is always at the root.

Heaps are commonly implemented using arrays, where the root element is at index 1 (index 0 is left empty for simplicity), and for any element at index i:

  • Its left child is at index 2i + 1,
  • Its right child is at index 2i + 2,
  • Its parent is at index Math.floor(i / 2).

Heaps are particularly useful for implementing priority queues, where elements with higher priority are served before elements with lower priority. They have efficient insertion and removal of elements, with insertion taking O(log n) time and removal taking O(log n) time as well, where n is the number of elements in the heap.

Common operations on heaps include:

  • Insertion: Inserting an element into the heap while maintaining the heap property.
  • Deletion: Removing the root element from the heap while maintaining the heap property.
  • Heapify: Reordering the elements of an array so that they satisfy the heap property.
  • Heap Sort: Sorting an array in ascending or descending order using a heap.

Practical application of heap

Consider a situation where you are faced with the challenge of finding the top 3 elements in a collection or the least 3 occurrences of a word in a sentence. This is a trivial problem and there are many ways to approach the problem.

One way would be to sort the list and then get the items in the first 3 indices or last 3 indices, depending on the problem at hand. The problem with this approach is the efficiency of the solution. The bottleneck in this implementation is the sorting whose time complexity is O(nlogn).

The heap data structure, however, is tailor-made for these kinds of problems as we are guaranteed to have the largest (or lowest) element at the top of the heap, depending on whether we have a max-heap or a min-heap.

Implementation of a heap in TS

In this section, I’m going to implement the heap data structure using the Typescript language.

type HeapType = 'maxheap' | 'minheap';

abstract class Heap<T> {
heap: T[];
private type: HeapType;

constructor(data: T[] = []) {
this.heap = Heap.heapify(data) as T[];
}

static heapify(data: T[]) {
// implement heap sort
return data.sort();
}

swap(firstIndex: number, secondIndex: number) {
const tmp = this.getItem(firstIndex);
this.heap[firstIndex] = this.getItem(secondIndex);
this.heap[secondIndex] = tmp;
}

peek() {
if (!this.heap.length) {
return null;
}

return this.getItem(0);
}

remove() {
if (!this.heap.length) {
return null;
}

this.swap(0, this.heap.length - 1);
const item = this.heap.pop();
this.heapifyDown();

return item;
}

add(item: T) {
this.heap.push(item);
this.heapifyUp();
}

getItem(index: number) {
return this.heap[index];
};

abstract compare(first: T, second: T): boolean;

heapifyUp() {
let index = this.heap.length - 1;
while (this.hasParent(index) && this.compare(this.parent(index), this.getItem(index))) {
this.swap(index, this.parentIndex(index));
index = this.parentIndex(index);
}
}

heapifyDown() {
let index = 0;
let childIndex = 0;
while (this.hasLeftChild(index)) {
childIndex = this.leftChildIndex(index);
if (this.hasRightChild(index) && this.compare(this.leftChild(index), this.rightChild(index))) {
childIndex = this.rightChildIndex(index);
}

if (this.compare(this.getItem(childIndex), this.getItem(index))) {
break;
} else {
this.swap(index, childIndex)
}

index = childIndex;
}
}

displayHeap() {
console.log(this.heap);
}

hasParent(index: number) {
return this.parentIndex(index) < index;
}

parent(index: number) {
return this.getItem(this.parentIndex(index));
}

parentIndex(index: number) {
return Math.floor(index / 2)
}

hasLeftChild(index: number) {
return this.leftChildIndex(index) < this.heap.length;
}

leftChild(index: number) {
return this.getItem(this.leftChildIndex(index));
}

leftChildIndex(index: number) {
return 2 * index + 1;
}

hasRightChild(index: number) {
return this.rightChildIndex(index) < this.heap.length;
}

rightChild(index: number) {
return this.getItem(this.rightChildIndex(index));
}

rightChildIndex(index: number) {
return 2 * index + 2;
}
}

// the data type used here [string, number] enforces how the
// compare method is implemented
class MinHeap extends Heap<[string, number]> {
compare(first: [string, number], second: [string, number]): boolean {
return first[1] > second[1];
}
}

class MaxHeap extends Heap<[string, number]> {
compare(first: [string, number], second: [string, number]): boolean {
return first[1] < second[1];
}
}

This reason why this is an abstract class is to allow the heap to be flexible enough to receive various data types instead of just an array of numbers. The way the compare method is implemented in the class extending this Heap class determines if it will be a minHeap or a maxHeap.

Problem solving

Consider the following problem definition:

Write a function that, given a string of text (possibly with punctuation and line-breaks), returns an array of the top-3 most occurring words, in descending order of the number of occurrences.

Assumptions:

  • A word is a string of letters (A to Z) in ASCII.
  • Matches should be case-insensitive, and the words in the result should be lowercased.
  • If a text contains fewer than three unique words, then either the top-2 or top-1 words should be returned, or an empty array if a text contains no words.

Example:

Given the string below, our function is expected to return [a, of, on] .

In a village of La Mancha, the name of which I have no desire to call to
mind, there lived not long since one of those gentlemen that keep a lance
in the lance-rack, an old buckler, a lean hack, and a greyhound for
coursing. An olla of rather more beef than mutton, a salad on most
nights, scraps on Saturdays, lentils on Fridays, and a pigeon or so extra
on Sundays, made away with three-quarters of his income.

The steps to solve the problem are:

  • Tokenize the sentence into words.
  • Build a counter to count the occurrence of each word.
  • Return the top 3 words.

The first step can be done in O(n) time complexity where n is the number of characters in the sentence.

The second step can be done in O(m) time complexity where m is the number of words in the sentence.

The third step can be done in O(log n) time complexity if we use a heap or O(n log n) time complexity if we decide to sort and return the top 3 based on index.

It is obvious that using heap is the more efficient approach to solving step 3.

function splitSentenceToWords(sentence: string) {
let result: string[] = [];
let word = '';

let validChars = /[a-z]/;
for (let i = 0; i < sentence.length; i++) {
let char = sentence[i].toLocaleLowerCase();
if (validChars.test(char)) {
word += char;
}

if (char === ' ') {
result.push(word);
word = '';
}
}

if (word) {
result.push(word);
}

return result;
}

function countWordOccurrence(words: string[]) {
const counter = {};
for (let i = 0; i < words.length; i++) {
let word = words[i];
if (!counter[word]) {
counter[word] = 1;
} else {
counter[word] += 1;
}
}

return counter;
}

function getThreeMostUsedWords(sentence: string) {
const heap = new MaxHeap();
const words = splitSentenceToWords(sentence);
const counter = countWordOccurrence(words);
Object.keys(counter).forEach(word => {
heap.add([word, counter[word]]);
});

const result: string[] = [];
for (let i = 0; i < 3; i++) {
let word = heap.remove();
if (!word) break;

result.push(word[0]);
}

return result;
}

const sentence = 'In a village of La Mancha, the name of which ' +
'I have no desire to call to mind, there lived not long since ' +
'one of those gentlemen that keep a lance in the lance-rack, an ' +
'old buckler, a lean hack, and a greyhound for coursing. An ' +
'olla of rather more beef than mutton, a salad on most nights, ' +
'scraps on Saturdays, lentils on Fridays, and a pigeon or so ' +
'extra on Sundays, made away with three-quarters of his income.';
console.log('Response', getThreeMostUsedWords(sentence)); // => [a, of, on]

--

--

Benjamin Udoh

Flutter developer, Senior Software Engineer at Yellow Card Financial