Multithreading VS Multiprocessing in Python

Multithreading and multiprocessing are two main concepts in computer science and programming. However, they can be confusing. Worry not, because, in this tutorial, we are going to learn the differences between them.

Parallel and Concurrent Execution

In multiprocessing, processes run in parallel. However, multithreading allows threads spawned by a process to run concurrently. When we talk about parallel execution, tasks execute on multiple CPUs at the same time. While in concurrent execution, only one task runs at one time on a single CPU. However, they make progress simultaneously.

Memory

Each process created by a Parent process has its separate memory. However, threads spawned by a process share memory space. Each has its thread ID, a program counter, a register set, and a stack. It shares a code section, data section, and other OS resources with other threads. Hence, threads consume less memory than processes do.

Inter-Process Communication and Locks

Since processes have different memory segments, a communication channel is required to pass data between them, which becomes complicated. Threads, on the other hand, don't require that. However, because of shared resources, we need to use locks to avoid the race condition. A race condition occurs when shared data is accessed and manipulated concurrently. However, using locks make sure that only one thread manipulates the data at one time. For a thread to manipulate the shared data, it has to acquire the lock. During that time, if some other thread tries to acquire the lock, it will have to wait until the lock gets released.

GIL

Programs in Python are single-threaded and use a single CPU because of the Global Interpreter Lock or GIL. It is a lock that only allows one thread to hold control of the Python interpreter, and thus only one thread gets executed at a time. Therefore, Python cannot use multiprocessing automatically. However, the multiprocessing module solves this problem by bypassing the GIL.

In multiprocessing, each process has a separate GIL and instance of a Python interpreter. However, in multithreading, all the threads have a single GIL and thus one Python interpreter. Therefore, only one thread can be executed at one time.

Performance

Let's now go ahead and compare their performances. We will do it for two types of tasks, i.e., I/O bound and CPU bound. An I/O bound task utilizes most of its time performing I/O operations. A CPU bound task, on the other hand, requires the CPU mostly to complete it.

We will use multiprocessing and threading modules.

CPU Bound Task

Multiprocessing

import time
from multiprocessing import Process


def test(x):
    count  = 0
    for i in range(0, x**x):
      count += 1


if __name__ == "__main__":
    starttime = time.time()
    processlist = []
    for i in range(0, 4):
        process = Process(target=test, args=(9,))
        processlist.append(process)
        process.start()

    for process in processlist:
        process.join()
    endtime = time.time()
    print(f"Time taken {endtime-starttime} seconds")

Output

Time taken 138.53867363929749 seconds

Single Processing and Threading

import time


def test(x):
    count = 0
    for i in range(0, x ** x):
        count += 1


starttime = time.time()
for i in range(0, 4):
    test(9)
endtime = time.time()
print(f"Time taken {endtime-starttime} seconds")

Output

Time taken 237.88586354255676 seconds

Multithreading

import time
import threading


def test(x):
    count = 0
    for i in range(0, x ** x):
        count += 1


if __name__ == "__main__":
    starttime = time.time()
    threads = []
    for i in range(0, 4):
        thread = threading.Thread(target=test, args=(9,))
        threads.append(thread)
        thread.start()
    for thread in threads:
        thread.join()
    endtime = time.time()
    print(f"Time taken {endtime-starttime} seconds")

Output

Time taken 246.3547739982605 seconds

As you can see in the above example, multiprocessing gives better results than serial execution, which in turn performs better than multithreading. Why is this so? As we know that in multiprocessing, processes run in parallel and utilize multiple CPUs. Since our code requires CPU most of the time, multiprocessing gives better results. However, in multithreading, only one thread runs at one time. Because there is no blocking due to any I/O operation, the CPU does not remain idle. Hence, it almost gives the same performance as serial execution does. However, remember, there is also an overhead of switching between the threads, which is not in serial execution. Thus, serial execution performs better than multithreading.

I/O Bound Task

Multiprocessing

import time
from multiprocessing import Process


def test(i):
    file_name = "file" + str(i) + ".txt"
    f = open(file_name, "w")
    print("writing to a file")
    for j in range(0, 8 ** 8):
        f.write("This is a sample text")


if __name__ == "__main__":
    starttime = time.time()
    processlist = []
    for i in range(0, 4):
        process = Process(target=test, args=(i,))
        processlist.append(process)
        process.start()

    for process in processlist:
        process.join()
    endtime = time.time()
    print(f"Time taken {endtime-starttime} seconds")

Output

writing to a file
writing to a file
writing to a file
writing to a file
Time taken 154.52078652381897 seconds

Uniprocessing and Single threading

import time


def test(i):
    file_name = "file" + str(i) + ".txt"
    f = open(file_name, "w")
    print("writing to a file")
    for j in range(0, 8 ** 8):
        f.write("This is a sample text")


starttime = time.time()
for i in range(0, 4):
    test(i)
endtime = time.time()
print(f"Time taken {endtime-starttime} seconds")

Output

writing to a file
writing to a file
writing to a file
writing to a file
Time taken 498.11753392219543 seconds

Multithreading

import time
import threading


def test(i):
    file_name = "file" + str(i) + ".txt"
    f = open(file_name, "w")
    print("writing to a file")
    for j in range(0, 8 ** 8):
        f.write("This is a sample text")


if __name__ == "__main__":
    starttime = time.time()
    threads = []
    for i in range(0, 4):
        thread = threading.Thread(target=test, args=(i,))
        threads.append(thread)
        thread.start()
    for thread in threads:
        thread.join()
    endtime = time.time()
    print(f"Time taken {endtime-starttime} seconds")

Output

writing to a file
writing to a file
writing to a file
writing to a file Time taken 112.06211709976196 seconds

When an I/O operation comes up in serial execution, the CPU remains idle and that thread blocks. Hence, no activity is done on the part of that process until that operation gets completed. However, in multithreading, GIL releases the lock, and some other thread acquires it and starts running. That is why multithreading performs better than serial execution. Moreover, the overhead of creating a process is more than creating a thread. Therefore, multithreading performs better than multiprocessing, in this case.

So, use multithreading when tasks include I/O operations or network requests mostly, and use multiprocessing when you have CPU intensive tasks.

The differences between multithreading and multiprocessing are summarized in the table below.

Multithreading	Multiprocessing
It uses threads.	It uses processes.
Only uses a single CPU.	Uses multiple CPUs or cores.
Thread creation is faster than process creation, i.e., less overhead.	Process creation is slower.
Concurrent Execution.	Parallel Execution.
Threads have the same memory space.	Processes have a separate memory space.
Requires locks to handle shared data.	It does not require locks as threads do because the memory space is different unless you explicitly use some shared resource.
One GIL.	Each process has a separate GIL.
No such communication channel is required.	Processes require a communication channel for IPC.
Threads are not killable.	Child Processes can be killed.
Suitable for I/O bound and network bound tasks.	Suitable for CPU bound tasks.
Context switching between threads is faster than processes.	Context switching is slower.

For more detailed information on multiprocessing and multithreading, visit Introduction to Multiprocessing and Process in Python and Multithreading in Python.

Multithreading VS Multiprocessing in Python

Parallel and Concurrent Execution

Memory

Inter-Process Communication and Locks

GIL

Performance

CPU Bound Task

Multiprocessing

Single Processing and Threading

Multithreading

I/O Bound Task

Multiprocessing

Uniprocessing and Single threading

Multithreading

C++ : Linked lists in C++ (Singly linked list)

Adding Outline to Text Using CSS

Set, toggle and clear a bit in C

12 Creative CSS and JavaScript Text Typing Animations

Inserting a new node to a linked list in C++

pow() in Python

Dutch National Flag problem - Sort 0, 1, 2 in an array

memoryview() in Python

next() in Python

map() in Python

Mouse Rollover Zoom Effect on Images

Important functions in math.h library of C

Formatting the print using printf in C

Linked list traversal using loop and recursion in c++

Calculator using Java Swing and AWT with source code

Animate your Website Elements with CSS Transforms

Controlling the Outline Position with outline-offset

Prime numbers using Sieve Algorithm in C