Multiprocessing in Python: Process vs Pool Class

June 25, 2020 13452

Having studied the Process and the Pool class of the multiprocessing module, today, we are going to see what the differences between them are. So, given the task at hand, you can decide which one to use.

Management

The Pool class is easier to use than the Process class because you do not have to manage the processes by yourself. It creates the processes, splits the input data, and returns the result in a list. It also waits for the workers to finish their tasks, i.e., you do not have to call the join() method explicitly.

Memory

While the Process keeps all the processes in the memory, the Pool keeps only those that are under execution. Therefore, if you have a large number of tasks, and if they have more data and take a lot of space too, then using process class might waste a lot of memory.

The overhead of creating a Pool is more. Therefore, when there are a small number of tasks, and they are not repetitive, it is advisable to use a Process in this case.

I/O operations

Both the Process and the Pool class use FIFO (First In First Out) scheduler. However, if the current process is waiting for, or executing an I/O operation, then the Process class halts the current one and schedules another one from the task queue. The Pool class, on the other hand, waits for the process to complete its I/O operation, i.e., it does not schedule another one until the current has finished its execution. Because of this, the execution time might increase. Process is preferred over Pool when your task is I/O bound (A program is I/O bound if it spends most of its time waiting for the I/O operation to complete).

Consider the following example where we create a file, write to it, and close it using the test() function.

Using the Process Class.

import time
from multiprocessing import Process

def test(fname):
f = open(fname, "w")
f.write("hi")
f.write("hi")
f.write("hi")
f.write("hi")
f.close()

if __name__ == "__main__":
starttime = time.time()
processlist = []
p1 = Process(target=test, args=("sample1.txt",))
p2 = Process(target=test, args=("sample2.txt",))
p1.start()
p2.start()
p1.join()
p2.join()
endtime = time.time()
print(f"Time taken {endtime-starttime} seconds")


Output

Time taken 0.021641016006469727 seconds

Let’s do the same using the Pool class.

import time
from multiprocessing import Pool

def test(fname):
f = open(fname, "w")
f.write("hi")
f.write("hi")
f.write("hi")
f.write("hi")
f.close()

if __name__ == "__main__":
starttime = time.time()
pool = Pool()
a = pool.apply_async(test, args=("sample1.txt",))
b = pool.apply_async(test, args=("sample2.txt",))
a.wait()
b.wait()
endtime = time.time()
print(f"Time taken {endtime-starttime} seconds")


Output

Time taken 0.022391319274902344 seconds

As you can observe, the time taken by the Pool class is slightly more.

In short, when the data is more, and the tasks are repetitive, prefer Pool class. If the task is I/O bound, use Process class.

Liked the post?
A computer science student having interest in web development. Well versed in Object Oriented Concepts, and its implementation in various projects. Strong grasp of various data structures and algorithms. Excellent problem solving skills.
Editor's Picks
0 COMMENT