Having studied the Process and the Pool class of the multiprocessing module, today, we are going to see what the differences between them are. So, given the task at hand, you can decide which one to use.
Management
The Pool class is easier to use than the Process class because you do not have to manage the processes by yourself. It creates the processes, splits the input data, and returns the result in a list. It also waits for the workers to finish their tasks, i.e., you do not have to call the join() method explicitly.
Memory
While the Process keeps all the processes in the memory, the Pool keeps only those that are under execution. Therefore, if you have a large number of tasks, and if they have more data and take a lot of space too, then using process class might waste a lot of memory.
The overhead of creating a Pool is more. Therefore, when there are a small number of tasks, and they are not repetitive, it is advisable to use a Process in this case.
I/O operations
Both the Process and the Pool class use FIFO (First In First Out) scheduler. However, if the current process is waiting for, or executing an I/O operation, then the Process class halts the current one and schedules another one from the task queue. The Pool class, on the other hand, waits for the process to complete its I/O operation, i.e., it does not schedule another one until the current has finished its execution. Because of this, the execution time might increase. Process is preferred over Pool when your task is I/O bound (A program is I/O bound if it spends most of its time waiting for the I/O operation to complete).
Consider the following example where we create a file, write to it, and close it using the test() function.
Using the Process Class.
import time
from multiprocessing import Process
def test(fname):
f = open(fname, "w")
f.write("hi")
f.write("hi")
f.write("hi")
f.write("hi")
f.close()
if __name__ == "__main__":
starttime = time.time()
processlist = []
p1 = Process(target=test, args=("sample1.txt",))
p2 = Process(target=test, args=("sample2.txt",))
p1.start()
p2.start()
p1.join()
p2.join()
endtime = time.time()
print(f"Time taken {endtime-starttime} seconds")
Output
Time taken 0.021641016006469727 seconds
Let’s do the same using the Pool class.
import time
from multiprocessing import Pool
def test(fname):
f = open(fname, "w")
f.write("hi")
f.write("hi")
f.write("hi")
f.write("hi")
f.close()
if __name__ == "__main__":
starttime = time.time()
pool = Pool()
a = pool.apply_async(test, args=("sample1.txt",))
b = pool.apply_async(test, args=("sample2.txt",))
a.wait()
b.wait()
endtime = time.time()
print(f"Time taken {endtime-starttime} seconds")
Output
Time taken 0.022391319274902344 seconds
As you can observe, the time taken by the Pool class is slightly more.
In short, when the data is more, and the tasks are repetitive, prefer Pool class. If the task is I/O bound, use Process class.