Multiprocessing in Python: Process vs Pool Class

Having studied the Process and the Pool class of the multiprocessing module, today, we are going to see what the differences between them are. So, given the task at hand, you can decide which one to use.

Management

The Pool class is easier to use than the Process class because you do not have to manage the processes by yourself. It creates the processes, splits the input data, and returns the result in a list. It also waits for the workers to finish their tasks, i.e., you do not have to call the join() method explicitly.

Memory

While the Process keeps all the processes in the memory, the Pool keeps only those that are under execution. Therefore, if you have a large number of tasks, and if they have more data and take a lot of space too, then using process class might waste a lot of memory.

The overhead of creating a Pool is more. Therefore, when there are a small number of tasks, and they are not repetitive, it is advisable to use a Process in this case.

I/O operations

Both the Process and the Pool class use FIFO (First In First Out) scheduler. However, if the current process is waiting for, or executing an I/O operation, then the Process class halts the current one and schedules another one from the task queue. The Pool class, on the other hand, waits for the process to complete its I/O operation, i.e., it does not schedule another one until the current has finished its execution. Because of this, the execution time might increase. Process is preferred over Pool when your task is I/O bound (A program is I/O bound if it spends most of its time waiting for the I/O operation to complete).

Consider the following example where we create a file, write to it, and close it using the test() function.

Using the Process Class.

import time
from multiprocessing import Process


def test(fname):
    f = open(fname, "w")
    f.write("hi")
    f.write("hi")
    f.write("hi")
    f.write("hi")
    f.close()


if __name__ == "__main__":
    starttime = time.time()
    processlist = []
    p1 = Process(target=test, args=("sample1.txt",))
    p2 = Process(target=test, args=("sample2.txt",))
    p1.start()
    p2.start()
    p1.join()
    p2.join()
    endtime = time.time()
    print(f"Time taken {endtime-starttime} seconds")

Output

Time taken 0.021641016006469727 seconds

Let’s do the same using the Pool class.

import time
from multiprocessing import Pool


def test(fname):
    f = open(fname, "w")
    f.write("hi")
    f.write("hi")
    f.write("hi")
    f.write("hi")
    f.close()


if __name__ == "__main__":
    starttime = time.time()
    pool = Pool()
    a = pool.apply_async(test, args=("sample1.txt",))
    b = pool.apply_async(test, args=("sample2.txt",))
    a.wait()
    b.wait()
    endtime = time.time()
    print(f"Time taken {endtime-starttime} seconds")

Output

Time taken 0.022391319274902344 seconds

As you can observe, the time taken by the Pool class is slightly more.

In short, when the data is more, and the tasks are repetitive, prefer Pool class. If the task is I/O bound, use Process class.

Multiprocessing in Python: Process vs Pool Class

Management

Memory

I/O operations

C++ : Linked lists in C++ (Singly linked list)

Adding Outline to Text Using CSS

Set, toggle and clear a bit in C

12 Creative CSS and JavaScript Text Typing Animations

Inserting a new node to a linked list in C++

pow() in Python

Dutch National Flag problem - Sort 0, 1, 2 in an array

memoryview() in Python

next() in Python

map() in Python

Mouse Rollover Zoom Effect on Images

Important functions in math.h library of C

Formatting the print using printf in C

Linked list traversal using loop and recursion in c++

Calculator using Java Swing and AWT with source code

Animate your Website Elements with CSS Transforms

Controlling the Outline Position with outline-offset

Prime numbers using Sieve Algorithm in C