A Guide to Python heapq and Heap in Python

Avatar

By squashlabs, Last Updated: Aug. 10, 2023

A Guide to Python heapq and Heap in Python

Getting Started with Python heapq

Python heapq is a built-in module that provides an implementation of the heap queue algorithm, also known as the priority queue algorithm. A heap is a binary tree that satisfies the heap property, which means that for every node, the key of the node is greater than or equal to the keys of its children.

The heapq module in Python provides functions to manage heaps efficiently. It allows you to easily push items onto a heap, pop items from a heap, and perform other heap-related operations.

To use the heapq module, you need to import it using the following line of code:

import heapq

Related Article: How to Measure Elapsed Time in Python

Creating and Manipulating a Heap

To create a heap, you can use the heapify() function from the heapq module. This function takes a list and rearranges its elements to satisfy the heap property.

import heapq

numbers = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
heapq.heapify(numbers)

After calling heapify(), the numbers list is transformed into a valid heap.

To push an item onto a heap, you can use the heappush() function. This function takes a heap and an item, and it pushes the item onto the heap while maintaining the heap property.

import heapq

numbers = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
heapq.heapify(numbers)
heapq.heappush(numbers, 7)

The heappush() function adds the item 7 to the heap, ensuring that the heap property is still satisfied.

To pop the smallest item from a heap, you can use the heappop() function. This function removes and returns the smallest element from the heap.

import heapq

numbers = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
heapq.heapify(numbers)
smallest = heapq.heappop(numbers)
print(smallest)  # Output: 1

In this example, the smallest element in the heap is 1, so it is removed and assigned to the variable smallest.

Heap as a Priority Queue

In addition to the basic heap operations, the heapq module can be used to create a priority queue. A priority queue is a data structure that allows you to efficiently insert elements with a priority and retrieve the element with the highest priority.

To use a heap as a priority queue, you need to store items as tuples, where the first element of the tuple represents the priority. The heapq module uses the first element of the tuple to determine the order of the items in the heap.

Here's an example of how to create a priority queue using a heap:

import heapq

queue = []
heapq.heappush(queue, (2, 'Task 1'))
heapq.heappush(queue, (1, 'Task 2'))
heapq.heappush(queue, (3, 'Task 3'))

while queue:
    priority, task = heapq.heappop(queue)
    print(task)

In this example, each task is represented by a tuple where the first element is the priority and the second element is the task itself. The tasks are added to the queue using heappush(), and then they are popped from the queue using heappop() in ascending order of their priorities.

The output of this code will be:

Task 2
Task 1
Task 3

As you can see, the tasks are printed in the order of their priorities.

Understanding Heaps in Python

A heap is a specialized tree-based data structure that satisfies the heap property. In Python, the heapq module provides functions to create and manipulate heaps.

Heaps can be implemented as binary trees or as arrays. The most commonly used type is the binary heap, which can be visualized as a binary tree with the heap property. The heap property states that for every node in the heap, the value of the node is greater than or equal to the values of its children (in a max heap), or less than or equal to the values of its children (in a min heap).

Python's heapq module provides functions to create and manipulate heaps. The most commonly used functions are:

- heapify(iterable): This function transforms the iterable into a valid heap. It rearranges the elements in the iterable so that the heap property is satisfied. The time complexity of this function is O(n), where n is the length of the iterable.

import heapq

data = [5, 3, 8, 1, 2]
heapq.heapify(data)
print(data)  # Output: [1, 2, 8, 5, 3]

- heappush(heap, item): This function adds an item to the heap while maintaining the heap property. The time complexity of this function is O(log n), where n is the number of elements in the heap.

import heapq

data = [1, 2, 3]
heapq.heappush(data, 0)
print(data)  # Output: [0, 1, 3, 2]

- heappop(heap): This function removes and returns the smallest item from the heap while maintaining the heap property. The time complexity of this function is O(log n), where n is the number of elements in the heap.

import heapq

data = [1, 2, 3]
smallest_item = heapq.heappop(data)
print(smallest_item)  # Output: 1
print(data)  # Output: [2, 3]

- heapreplace(heap, item): This function removes and returns the smallest item from the heap, and then adds the new item to it. This is equivalent to performing a heappop() followed by a heappush(), but more efficient. The time complexity of this function is O(log n), where n is the number of elements in the heap.

import heapq

data = [1, 2, 3]
smallest_item = heapq.heapreplace(data, 0)
print(smallest_item)  # Output: 1
print(data)  # Output: [0, 2, 3]

Python's heapq module also provides functions to access the smallest item in the heap without removing it (heapq.nsmallest()) and to merge multiple heaps into a single heap (heapq.merge()).

Understanding the basics of heaps is essential for efficiently solving problems that involve prioritization or finding the smallest or largest elements. The heapq module in Python provides a convenient and efficient way to work with heaps.

Related Article: Build a Chat Web App with Flask, MongoDB, Reactjs & Docker

Building a Heap in Python

In this chapter, we will explore how to build a heap in Python using the heapq module. A heap is a binary tree-based data structure that satisfies the heap property. The heap property states that for a given node, its value must be greater than or equal to the values of its children (for a max heap) or less than or equal to the values of its children (for a min heap).

Python's heapq module provides functions to perform operations on heaps efficiently. To build a heap, we can use the heapify function from the heapq module. The heapify function takes a list of elements and rearranges them in-place to satisfy the heap property.

Here's an example of how to build a heap using the heapify function:

import heapq

# Create a list of elements
elements = [4, 1, 7, 3, 8, 5]

# Build a heap from the list
heapq.heapify(elements)

print(elements)

Output:

[1, 3, 5, 4, 8, 7]

In the above example, we start with a list of elements [4, 1, 7, 3, 8, 5]. After applying heapify on the list, the elements are rearranged to satisfy the heap property, resulting in a valid heap [1, 3, 5, 4, 8, 7].

It's important to note that the heapify function modifies the original list in-place. If you want to preserve the original list, make a copy of it before applying heapify.

Building a heap using heapify has a time complexity of O(n), where n is the number of elements in the list. This makes it an efficient way to build a heap from an unsorted list.

In summary, to build a heap in Python, you can use the heapify function from the heapq module. This function rearranges elements in a list to satisfy the heap property. Remember to make a copy of the original list if you want to preserve it. Once the heap is built, you can perform various operations on it, such as inserting elements or removing the smallest or largest element.

Continue to the next chapter to learn more about performing operations on heaps in Python.

Adding and Removing Elements from a Heap

In this chapter, we will explore how to add and remove elements from a heap using the Python heapq module. The heapq module provides functions to create and manipulate heaps in Python.

Adding Elements to a Heap

To add elements to a heap, we can use the heappush() function provided by the heapq module. This function takes two arguments: the heap and the element to be added. The element is added to the heap while preserving the heap property.

Here's an example of how to add elements to a heap:

import heapq

heap = []
heapq.heappush(heap, 5)
heapq.heappush(heap, 2)
heapq.heappush(heap, 7)
heapq.heappush(heap, 1)

print(heap)  # Output: [1, 2, 7, 5]

In the above example, we create an empty heap and add elements to it using the heappush() function. The elements are added to the heap in such a way that the smallest element is always at the top.

Removing Elements from a Heap

To remove the smallest element from a heap, we can use the heappop() function provided by the heapq module. This function removes and returns the smallest element from the heap, while preserving the heap property.

Here's an example of how to remove elements from a heap:

import heapq

heap = [1, 2, 7, 5]

smallest = heapq.heappop(heap)
print(smallest)  # Output: 1
print(heap)  # Output: [2, 5, 7]

In the above example, we have a heap with elements [1, 2, 7, 5]. We use the heappop() function to remove the smallest element from the heap, which is 1. After removal, the heap is modified to maintain the heap property.

Related Article: How To Find Index Of Item In Python List

Replacing Elements in a Heap

The heapq module also provides a function called heapreplace() to replace the smallest element in a heap with a new element. This function is equivalent to calling both heappop() and heappush() together, but it is more efficient than doing so separately.

Here's an example of how to replace an element in a heap:

import heapq

heap = [1, 2, 7, 5]

smallest = heapq.heapreplace(heap, 3)
print(smallest)  # Output: 1
print(heap)  # Output: [2, 3, 7, 5]

In the above example, we replace the smallest element in the heap with the number 3 using the heapreplace() function. The function removes the smallest element (1) and adds the new element (3) while preserving the heap property.

By using these functions provided by the heapq module, we can easily add, remove, and replace elements in a heap in Python.

Heapify and Heap Sorting

In the previous chapters, we learned about the basics of heaps and how to use the Python heapq module to perform various operations on heaps. In this chapter, we will explore two important operations: heapify and heap sorting.

Heapify

Heapify is an operation that converts a regular list into a valid heap. It rearranges the elements in the list in such a way that they satisfy the heap property. The heap property states that for every node i in the heap, the value of the parent node is less than or equal to the values of its children.

The heapify function provided by the heapq module efficiently heapifies a list in-place. Let's see an example:

import heapq

# A list of integers
data = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]

# Heapify the list
heapq.heapify(data)

print(data)

Output:

[1, 1, 2, 3, 3, 4, 5, 6, 5, 9]

As you can see, the heapify function transformed the list data into a valid heap. The elements are rearranged in such a way that the heap property is satisfied.

Heap Sorting

Heap sorting is a sorting algorithm that uses a heap data structure to sort elements in ascending or descending order. The idea behind heap sorting is to first build a heap from the input list, and then repeatedly remove the largest (for ascending order) or smallest (for descending order) element from the heap until it is empty.

The heapq module provides the heappop function to remove the smallest element from the heap. We can utilize this function to implement heap sorting. Here's an example:

import heapq

# A list of integers
data = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]

# Heapify the list
heapq.heapify(data)

sorted_data = []
while data:
    sorted_data.append(heapq.heappop(data))

print(sorted_data)

Output:

[1, 1, 2, 3, 3, 4, 5, 5, 5, 6, 9]

In the above code, we first heapify the list data. Then, we repeatedly remove the smallest element from the heap using heappop and append it to the sorted_data list. This process continues until the heap is empty, resulting in a sorted list.

Heap sorting has a time complexity of O(n log n), making it an efficient sorting algorithm. However, it requires additional space to store the sorted elements.

In this chapter, we learned about the heapify operation, which converts a regular list into a valid heap, and the heap sorting algorithm, which uses a heap to sort elements. These operations are powerful and can be used in various scenarios where sorting and manipulation of data in a heap-like structure is required.

Related Article: Tutorial: i18n in FastAPI with Pydantic & Handling Encoding

Using heapq with Custom Objects

Python's heapq module is not limited to working with primitive data types. You can also use it to work with custom objects, as long as you provide a way to compare them. In this chapter, we will explore how to use heapq with custom objects in Python.

When working with custom objects, you need to define a comparison function or method that heapq can use to determine the order of the objects. This function or method should take two arguments and return a negative, zero, or positive value depending on whether the first argument is considered less than, equal to, or greater than the second argument.

Let's consider an example where we have a custom object called Person with attributes name and age. We want to store a list of Person objects in a heap and retrieve them based on their age. We can achieve this by defining a comparison method called __lt__ (less than) inside the Person class.

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def __lt__(self, other):
        return self.age < other.age

Now, we can create a heap of Person objects and use heapq functions like heappush and heappop to add and remove objects from the heap.

import heapq

heap = []
heapq.heappush(heap, Person('Alice', 25))
heapq.heappush(heap, Person('Bob', 30))
heapq.heappush(heap, Person('Charlie', 20))

youngest_person = heapq.heappop(heap)
print(youngest_person.name)  # Output: Charlie

In the example above, when we push Person objects to the heap using heappush, the __lt__ method is called to determine their order based on their age. When we pop the smallest element from the heap using heappop, the smallest Person object based on age is returned.

It's important to note that the __lt__ method is just one way to define the comparison logic for custom objects. Depending on your use case, you may want to define other comparison methods like __gt__ (greater than), __eq__ (equal), etc.

Using heapq with custom objects allows you to easily work with complex data structures and prioritize objects based on any criteria you define. It provides a flexible and efficient way to handle sorting and retrieval operations on custom objects.

In the next chapter, we will explore some advanced techniques and use cases for heapq and Heap in Python. Stay tuned!

Priority Queues with heapq

In this chapter, we will explore the functionality of the Python heapq module and learn how to use it to implement priority queues.

A priority queue is a data structure that allows elements to be inserted with a priority and retrieves them in a specific order based on their priority. The heapq module provides a way to create and manipulate heap data structures, which can be used to implement priority queues efficiently.

To begin, we need to import the heapq module:

import heapq

Creating a Priority Queue

To create a priority queue, we can use a list and apply heap operations on it. The heapq module provides several functions to work with heaps.

Let's create a priority queue and add some elements to it:

queue = []
heapq.heappush(queue, 5)
heapq.heappush(queue, 3)
heapq.heappush(queue, 7)
print(queue)  # Output: [3, 5, 7]

The heappush() function inserts an element into the priority queue while maintaining the heap property. The smallest element will always be at the root of the heap.

Retrieving Elements

To retrieve elements from the priority queue, we can use the heappop() function. It removes and returns the smallest element from the heap.

Let's retrieve elements from our priority queue:

smallest = heapq.heappop(queue)
print(smallest)  # Output: 3
print(queue)    # Output: [5, 7]

The heappop() function removes the smallest element (3) from the heap and returns it. The remaining elements are still in the heap and maintain the heap property.

Related Article: How To Replace Text with Regex In Python

Heapifying a List

We can convert an existing list into a heap using the heapify() function. This function rearranges the elements of the list to satisfy the heap property.

Let's heapify a list and print the result:

numbers = [9, 2, 5, 1, 7]
heapq.heapify(numbers)
print(numbers)  # Output: [1, 2, 5, 9, 7]

The heapify() function converts the list [9, 2, 5, 1, 7] into a heap. The resulting heap satisfies the heap property.

Heap Sorting

The heapq module also provides a function called heapsort() to sort a list in-place using a heap.

Let's sort a list using heap sort:

numbers = [9, 2, 5, 1, 7]
heapq.heapify(numbers)
sorted_numbers = [heapq.heappop(numbers) for _ in range(len(numbers))]
print(sorted_numbers)  # Output: [1, 2, 5, 7, 9]

The heapsort() function first converts the list into a heap using heapify(). Then, it repeatedly removes the smallest element from the heap using heappop(), resulting in a sorted list.

Implementing Dijkstra's Algorithm with heapq

Dijkstra's algorithm is a popular graph search algorithm used to find the shortest path between nodes in a graph. It is widely used in various applications, such as finding the shortest route between two locations on a map or optimizing network routing. In this chapter, we will explore how to implement Dijkstra's algorithm using the Python heapq module.

Before we dive into the implementation, let's briefly understand the key concepts of Dijkstra's algorithm. The algorithm works by iteratively selecting the node with the smallest distance from the source node and updating the distances of its neighboring nodes. It maintains a priority queue, known as a heap, to efficiently select the next node with the smallest distance.

To get started, we need a graph representation. We can use a dictionary to represent the graph, where each key represents a node, and the corresponding value is a list of tuples representing the neighbors and their edge weights. Here's an example of a graph representation:

graph = {
    'A': [('B', 5), ('C', 2)],
    'B': [('D', 4), ('E', 2)],
    'C': [('B', 8), ('E', 7)],
    'D': [('E', 6), ('F', 3)],
    'E': [('F', 1)],
    'F': []
}

Now, let's implement Dijkstra's algorithm using the heapq module:

import heapq

def dijkstra(graph, start):
    distances = {node: float('inf') for node in graph}
    distances[start] = 0

    heap = [(0, start)]

    while heap:
        current_distance, current_node = heapq.heappop(heap)

        if current_distance > distances[current_node]:
            continue

        for neighbor, weight in graph[current_node]:
            distance = current_distance + weight

            if distance < distances[neighbor]:
                distances[neighbor] = distance
                heapq.heappush(heap, (distance, neighbor))

    return distances

Let's break down the implementation:

- We initialize a dictionary called distances to store the shortest distances from the start node to all other nodes. Initially, all distances are set to infinity except for the start node, which is set to 0.

- We create a heap, heap, to store nodes and their corresponding distances. Each item in the heap is a tuple containing the distance and the node.

- We enter a while loop that continues until the heap is empty. In each iteration, we extract the node with the smallest distance from the heap using heapq.heappop().

- If the extracted distance is greater than the distance already stored in distances for the current node, we skip the rest of the iteration.

- Otherwise, we iterate over the neighbors of the current node and calculate the distance from the start node through the current node. If this distance is smaller than the current distance stored in distances for the neighbor, we update the distance and push the neighbor onto the heap using heapq.heappush().

- Finally, we return the distances dictionary containing the shortest distances from the start node to all other nodes in the graph.

To use the dijkstra() function, simply pass in the graph and the start node. Here's an example:

graph = {
    'A': [('B', 5), ('C', 2)],
    'B': [('D', 4), ('E', 2)],
    'C': [('B', 8), ('E', 7)],
    'D': [('E', 6), ('F', 3)],
    'E': [('F', 1)],
    'F': []
}

start_node = 'A'
distances = dijkstra(graph, start_node)

print(distances)

This will output the shortest distances from the start node 'A' to all other nodes in the graph.

In this chapter, we explored how to implement Dijkstra's algorithm using the Python heapq module. We learned how to represent a graph using a dictionary, and how to use a heap to efficiently select the next node with the smallest distance. The implementation provided can be easily adapted to different graph representations and can be a powerful tool for solving shortest path problems.

Using heapq for Merge Sort

In this chapter, we will explore how to use the heapq module in Python to implement the merge sort algorithm. Merge sort is a popular sorting algorithm that works by dividing the input list into smaller sublists, sorting them recursively, and then merging them back together.

The heapq module in Python provides functions to create and manipulate heaps. A heap is a binary tree where each parent node is smaller (or larger) than its children. This property allows us to efficiently extract the smallest (or largest) element from the heap.

To implement merge sort using heapq, we can follow these steps:

1. Divide the input list into smaller sublists until each sublist contains only one element. This can be done recursively using a divide-and-conquer approach.

2. Use heapq to convert each sublist into a heap. This can be achieved by using the heapify function, which rearranges the elements in the list so that it satisfies the heap property.

3. Merge the heaps back together by repeatedly extracting the smallest element from each heap using the heappop function, and appending it to the sorted list.

Let's see a code example that demonstrates the implementation of merge sort using heapq:

import heapq

def merge_sort(arr):
    if len(arr) <= 1:
        return arr

    mid = len(arr) // 2
    left = merge_sort(arr[:mid])
    right = merge_sort(arr[mid:])

    merged = []
    heapq.heapify(left)
    heapq.heapify(right)

    while left and right:
        if left[0] < right[0]:
            merged.append(heapq.heappop(left))
        else:
            merged.append(heapq.heappop(right))

    merged.extend(left)
    merged.extend(right)

    return merged

In the code above, we define a merge_sort function that takes an input list arr. If the length of the list is less than or equal to 1, we return the list as it is already sorted.

Otherwise, we divide the list into two halves and recursively call merge_sort on each half. We then create heaps from the two halves using heapify.

Next, we merge the heaps back together by repeatedly extracting the smallest element from each heap using heappop and appending it to the merged list. Finally, we return the sorted merged list.

By using heapq to implement merge sort, we can achieve a time complexity of O(n log n), where n is the number of elements in the input list.

In the next chapter, we will explore another use case for the heapq module: finding the k smallest (or largest) elements in a list.

Related Article: How to Force Pip to Reinstall the Current Version in Python

Real World Examples of heapq in Python

In this chapter, we will explore some real-world examples of using the heapq module in Python. heapq is a built-in module that provides an implementation of the heap queue algorithm, also known as the priority queue algorithm.

The priority queue algorithm allows us to efficiently insert and remove items with the smallest or largest priority. It is commonly used in various scenarios such as task scheduling, event handling, and graph algorithms.

Let's dive into some practical examples of using heapq in Python:

Example 1: Finding the N Smallest or Largest Elements

One common use case for heapq is finding the N smallest or largest elements in a collection. Suppose we have a list of numbers and we want to find the three smallest numbers. We can achieve this using the heapq.nsmallest() function:

import heapq

numbers = [5, 9, 2, 1, 7, 3, 6]
smallest_numbers = heapq.nsmallest(3, numbers)
print(smallest_numbers)  # Output: [1, 2, 3]

Similarly, we can use the heapq.nlargest() function to find the N largest elements:

import heapq

numbers = [5, 9, 2, 1, 7, 3, 6]
largest_numbers = heapq.nlargest(3, numbers)
print(largest_numbers)  # Output: [9, 7, 6]

Example 2: Merging Multiple Sorted Iterables

Another useful application of heapq is merging multiple sorted iterables into a single sorted iterable. This can be handy when dealing with large sorted datasets that don't fit into memory.

import heapq

iter1 = [1, 4, 7]
iter2 = [2, 5, 6]
iter3 = [3, 8, 9]

merged_iter = heapq.merge(iter1, iter2, iter3)
print(list(merged_iter))  # Output: [1, 2, 3, 4, 5, 6, 7, 8, 9]

Example 3: Efficiently Maintaining a Dynamic Priority Queue

Heapq can also be used to efficiently maintain a dynamic priority queue. In this example, we have a list of tasks with priorities, and we want to process them in ascending order of priority:

import heapq

tasks = [(1, 'Task 1'), (3, 'Task 3'), (2, 'Task 2')]

heapq.heapify(tasks)  # Convert the list into a heap
while tasks:
    priority, task = heapq.heappop(tasks)
    print(f'Processing task "{task}" with priority {priority}')

Output:

Processing task "Task 1" with priority 1
Processing task "Task 2" with priority 2
Processing task "Task 3" with priority 3

In this example, we use the heapq.heapify() function to convert the list of tasks into a heap. Then, we repeatedly use heapq.heappop() to extract the task with the smallest priority and process it.

These are just a few examples of how heapq can be used in real-world scenarios. The heapq module provides a powerful and efficient way to handle priority queues in Python.

In the next chapter, we will explore some additional tips and tricks for working with heapq and heap in Python. Stay tuned!

More Articles from the Python Tutorial: From Basics to Advanced Concepts series:

How To Update A Package With Pip

Updating packages is an essential task for Python developers. In this article, you will learn how to update packages using Pip, the package manager f… read more

How to Create Multiline Comments in Python

Creating multiline comments in Python can be a simple and way to add explanatory notes to your code. There are different methods you can use, such as… read more

Handling Large Volumes of Data in FastAPI

Learn strategies to manage large datasets in FastAPI including pagination, background jobs, and Pydantic model optimization. Chapters cover topics su… read more

How to Use the Python map() Function

The Python map() function is a powerful tool for manipulating data in Python. In this tutorial, you will learn how to use the map function to transfo… read more

How to Check If a Variable Exists in Python

Verifying the existence of a variable in Python code is a fundamental skill for any programmer. This article provides a simple guide on how to check … read more

How to Use Switch Statements in Python

Switch case statements are a powerful tool in Python for handling multiple conditions and simplifying your code. This article will guide you through … read more

Python Typing Module Tutorial: Use Cases and Code Snippets

Learn how to use the Python Typing Module for type hints and annotations in your code. This tutorial covers installation and setup, various annotatio… read more

Advanced Django Views & URL Routing: Mixins and Decorators

Class-based views in Django, mixin classes, and complex URL routing are essential concepts for developers to understand in order to build robust web … read more

Advanced Django Forms: Dynamic Forms, Formsets & Widgets

Deep dive into expert-level Django forms and uncover the power of dynamic form generation, formsets, inline formsets, and custom form validation. Lea… read more

How to Use Reduction with Python

Reduction in Python involves various methods for simplifying and optimizing code. From minimization techniques to streamlining examples, this article… read more