BlogsDope image BlogsDope

Multithreading VS Multiprocessing in Python

Dec. 15, 2020 PYTHON THREADING MULTIPROCESSING 5298

Multithreading and multiprocessing are two main concepts in computer science and programming. However, they can be confusing. Worry not, because, in this tutorial, we are going to learn the differences between them.

Parallel and Concurrent Execution


In multiprocessing, processes run in parallel. However, multithreading allows threads spawned by a process to run concurrently. When we talk about parallel execution, tasks execute on multiple CPUs at the same time. While in concurrent execution, only one task runs at one time on a single CPU. However, they make progress simultaneously.

Memory


Each process created by a Parent process has its separate memory. However, threads spawned by a process share memory space. Each has its thread ID, a program counter, a register set, and a stack. It shares a code section, data section, and other OS resources with other threads. Hence, threads consume less memory than processes do.

Inter-Process Communication and Locks


Since processes have different memory segments, a communication channel is required to pass data between them, which becomes complicated. Threads, on the other hand, don't require that. However, because of shared resources, we need to use locks to avoid the race condition. A race condition occurs when shared data is accessed and manipulated concurrently. However, using locks make sure that only one thread manipulates the data at one time. For a thread to manipulate the shared data, it has to acquire the lock. During that time, if some other thread tries to acquire the lock, it will have to wait until the lock gets released.

GIL


Programs in Python are single-threaded and use a single CPU because of the Global Interpreter Lock or GIL. It is a lock that only allows one thread to hold control of the Python interpreter, and thus only one thread gets executed at a time. Therefore, Python cannot use multiprocessing automatically. However, the multiprocessing module solves this problem by bypassing the GIL.

In multiprocessing, each process has a separate GIL and instance of a Python interpreter. However, in multithreading, all the threads have a single GIL and thus one Python interpreter. Therefore, only one thread can be executed at one time.

Performance


Let's now go ahead and compare their performances. We will do it for two types of tasks, i.e., I/O bound and CPU bound. An I/O bound task utilizes most of its time performing I/O operations. A CPU bound task, on the other hand, requires the CPU mostly to complete it.

We will use multiprocessing and threading modules.

CPU Bound Task


Multiprocessing

import time
from multiprocessing import Process


def test(x):
    count  = 0
    for i in range(0, x**x):
      count += 1


if __name__ == "__main__":
    starttime = time.time()
    processlist = []
    for i in range(0, 4):
        process = Process(target=test, args=(9,))
        processlist.append(process)
        process.start()

    for process in processlist:
        process.join()
    endtime = time.time()
    print(f"Time taken {endtime-starttime} seconds")

Output

​Time taken 138.53867363929749 seconds

Single Processing and Threading

import time


def test(x):
    count = 0
    for i in range(0, x ** x):
        count += 1


starttime = time.time()
for i in range(0, 4):
    test(9)
endtime = time.time()
print(f"Time taken {endtime-starttime} seconds")

Output

​Time taken 237.88586354255676 seconds

Multithreading

import time
import threading


def test(x):
    count = 0
    for i in range(0, x ** x):
        count += 1


if __name__ == "__main__":
    starttime = time.time()
    threads = []
    for i in range(0, 4):
        thread = threading.Thread(target=test, args=(9,))
        threads.append(thread)
        thread.start()
    for thread in threads:
        thread.join()
    endtime = time.time()
    print(f"Time taken {endtime-starttime} seconds")

Output

​Time taken 246.3547739982605 seconds

As you can see in the above example, multiprocessing gives better results than serial execution, which in turn performs better than multithreading. Why is this so? As we know that in multiprocessing, processes run in parallel and utilize multiple CPUs. Since our code requires CPU most of the time, multiprocessing gives better results. However, in multithreading, only one thread runs at one time. Because there is no blocking due to any I/O operation, the CPU does not remain idle. Hence, it almost gives the same performance as serial execution does. However, remember, there is also an overhead of switching between the threads, which is not in serial execution. Thus, serial execution performs better than multithreading.

I/O Bound Task


Multiprocessing

import time
from multiprocessing import Process


def test(i):
    file_name = "file" + str(i) + ".txt"
    f = open(file_name, "w")
    print("writing to a file")
    for j in range(0, 8 ** 8):
        f.write("This is a sample text")


if __name__ == "__main__":
    starttime = time.time()
    processlist = []
    for i in range(0, 4):
        process = Process(target=test, args=(i,))
        processlist.append(process)
        process.start()

    for process in processlist:
        process.join()
    endtime = time.time()
    print(f"Time taken {endtime-starttime} seconds")

Output

writing to a file
writing to a file
writing to a file
writing to a file
Time taken 154.52078652381897 seconds

Uniprocessing and Single threading

import time


def test(i):
    file_name = "file" + str(i) + ".txt"
    f = open(file_name, "w")
    print("writing to a file")
    for j in range(0, 8 ** 8):
        f.write("This is a sample text")


starttime = time.time()
for i in range(0, 4):
    test(i)
endtime = time.time()
print(f"Time taken {endtime-starttime} seconds")

Output

​writing to a file
writing to a file
writing to a file
writing to a file
Time taken 498.11753392219543 seconds

Multithreading

import time
import threading


def test(i):
    file_name = "file" + str(i) + ".txt"
    f = open(file_name, "w")
    print("writing to a file")
    for j in range(0, 8 ** 8):
        f.write("This is a sample text")


if __name__ == "__main__":
    starttime = time.time()
    threads = []
    for i in range(0, 4):
        thread = threading.Thread(target=test, args=(i,))
        threads.append(thread)
        thread.start()
    for thread in threads:
        thread.join()
    endtime = time.time()
    print(f"Time taken {endtime-starttime} seconds")

Output

​writing to a file
writing to a file
writing to a file
writing to a file Time taken 112.06211709976196 seconds

When an I/O operation comes up in serial execution, the CPU remains idle and that thread blocks. Hence, no activity is done on the part of that process until that operation gets completed. However, in multithreading, GIL releases the lock, and some other thread acquires it and starts running. That is why multithreading performs better than serial execution. Moreover, the overhead of creating a process is more than creating a thread. Therefore, multithreading performs better than multiprocessing, in this case.

So, use multithreading when tasks include I/O operations or network requests mostly, and use multiprocessing when you have CPU intensive tasks.

The differences between multithreading and multiprocessing are summarized in the table below.

Multithreading
Multiprocessing
It uses threads.
It uses processes.
Only uses a single CPU.
Uses multiple CPUs or cores.
Thread creation is faster than process creation, i.e., less overhead.
Process creation is slower.
Concurrent Execution.
Parallel Execution.
Threads have the same memory space.
Processes have a separate memory space.
Requires locks to handle shared data.
It does not require locks as threads do because the memory space is different unless you explicitly use some shared resource.
One GIL.
Each process has a separate GIL.
No such communication channel is required.
Processes require a communication channel for IPC.
Threads are not killable.
Child Processes can be killed.
Suitable for I/O bound and network bound tasks.
Suitable for CPU bound tasks.
Context switching between threads is faster than processes.
Context switching is slower.

For more detailed information on multiprocessing and multithreading, visit Introduction to Multiprocessing and Process in Python and Multithreading in Python.



Liked the post?
A computer science student having interest in web development. Well versed in Object Oriented Concepts, and its implementation in various projects. Strong grasp of various data structures and algorithms. Excellent problem solving skills.
Editor's Picks
0 COMMENT

Please login to view or add comment(s).