How do you achieve concurrency with Python threads?

Introduction

The process of threading allows the execution of multiple instructions of a program, at once. Only multi-threaded programming languages like Python support this technique. Several I/O operations running consecutively decelerates the program. So,  the process of threading helps to achieve concurrency.

In this article, we will explore:

  1. Types of concurrency in Python
  2. Global Interpreter Lock
  3. Need for Global Interpreter Lock
  4. Thread execution model in Python 2
  5. Global Interpreter Lock in Python 3

Types of Concurrency In Python

In general, concurrency is the parallel execution of different units of a program, which helps optimize and speed up the overall process. In Python, there are 3 methods to achieve concurrency:

  1. Multi-Threading
  2. Asyncio
  3. Multi-processing

We will be discussing the fundamentals of thread-execution model. Before going into the concepts directly, we shall first discuss Python’s Global Interpreter Lock (GIL).

Global Interpreter Lock

Python threads are real system threads (POSIX threads). The host operating system fully manages the POSIX threads, also known as p-threads.

In multi-core operating systems, the Global Interpreter Lock (GIL) prevents the parallel execution of p-threads of a multi-threaded Python process. Thus, ensuring that only one thread runs in the interpreter, at any given time.

Why do we need Global Interpreter Lock?

GIL helps to simplify, the implementation of the interpreter, and memory management. To understand how GIL does this, we need to understand reference counting.

For example: In the code below, ‘b’ is not a new list. It is just a reference to the previous list ‘a.’

>>> a = []
>>> b = a
>>> b.append(1)
>>> a
[1]
>>> a.append(2)
>>> b
[1,2]

 

Python uses reference counting variables to track the number of references that point to an object. The memory occupied by the object is released if the value of the reference counting variable is zero. If threads, of a process sharing the same memory, try to access this variable to increment and decrement simultaneously, it can cause leaked memory that is never released, or releasing the memory incorrectly. And this leads to crashes.

One solution is to have a lock for the reference counting variable object memory by using semaphores so that it is not modified simultaneously. However, adding locks to all objects increases the performance overhead in the acquisition and release of locks. Hence, Python has a GIL which gives access to all resources of a process to only one thread at one time. Apart from GIL there are other solutions, such as garbage collection used in JPython interpreter, for memory management.

So, the primary outcome of GIL is, instead of parallel computing you get pre-emptive (threading) and co-operative multitasking (asyncio).

Thread Execution Model in Python 2

 

#sek.py #par.py
import time
def countdown(n):
     while n > 0:
     n -= 1
count = 50000000
start = time.time()
countdown(count)
end = time.time()
print('Time taken in seconds -', end-start)

 

import time
from threading import Thread
COUNT = 50000000
def countdown(n):
     while n>0:
     n -= 1
t1 = Thread(target=countdown, args=(COUNT//2,))
t2 = Thread(target=countdown, args=(COUNT//2,))
​start = time.time()
t1.start()
t2.start()
t1.join()
t2.join()
end = time.time()
​print('Time taken in seconds -', end - start)

 

~/Pythonpractice/blog❯ python seq.py
(‘Time taken in seconds -‘, 1.2412900924682617)
~/Pythonpractice/blog❯ python par.py
(‘Time taken in seconds -‘, 1.8751227855682373)

Ideally, the par.py execution time should be half of the seq.py execution time. However, in the above example we can see that the par.py execution time is slightly higher than that of seq.py. To understand the reduction in performance, despite the sharing the work between two threads which run in parallel, we need to first discuss CPU-bound and I/O-bound threads.

CPU-bound threads are threads performing CPU intense operations such as matrix multiplication or nested loop operations. Here, the speed of program execution depends on CPU performance.

 

 

I/O-bound threads are threads performing I/O operations such as listening to a socket or waiting for a network connection. Here, the speed of program execution depends on factors including external file systems and network connections.

 

Scenarios in a Python multi-threaded program

When all threads are I/O-bound

If a thread is running, it holds the GIL. When the thread hits the I/O operation, it releases the GIL, and another thread acquires it to get executed. This alternate execution of threads is called multi-tasking.

 

 

Where one thread is CPU bound, and another thread is IO-bound:

A CPU bound thread is a unique case in thread execution. A CPU bound thread releases the GIL after every few ticks and tries to acquire again. A tick is a machine instruction. When releasing the GIL, it also signals the next thread in the execution queue (ready queue of the operating system) that the GIL has been released. Now, these CPU-bound and I/O-bound threads are in a race to acquire the GIL. The operating system decides which thread needs to be executed. This model of executing the next thread in the execution queue, before completing the previous thread, is called pre-emptive multitasking.

 

 

 

In most of the cases, operating system gives preference to the CPU bound thread and allows it to reacquire the GIL, leaving the I/O-bound thread starving. In the below diagram, the CPU-bound thread has released GIL and signaled thread 2. But even before thread 2 tries to acquire GIL, the CPU-bound thread has reacquired the GIL.  This issue has been resolved in Python 3 interpreter’s GIL.

 

 

In a single core operating system, if the CPU bound thread reacquires GIL, it pushes back the second thread to the ready queue, assigning it some priority. This is because Python doesn’t have control over the priorities assigned by the operating system.

In a multicore operating system, if the CPU bound thread reacquires the GIL then it does not push back the second thread, but continuously tries to acquire the GIL using another core. This characterizes thrashing. Thrashing reduces the performance if many threads try to acquire the GIL, using different cores of the operating system. Python 3 also addresses the issue of thrashing.

 

 

Global Interpreter Lock in Python 3

The Python 3 threat execution model has a new GIL. If there is only one thread running, it continues to run, until it hits an I/O operation or other thread requests, to drop the GIL. A global variable (gil_drop_request) helps to implement this.

If gil_drop_request = 0, running thread can continue until it hits I/O

If gil_drop_request = 1, running thread is forced to give up the GIL

Instead of CPU-bound thread check after every few ticks, the second thread is sending a GIL drop request by setting the variable gil_drop_request = 1 after reaching a timeout. The first thread will then immediately drop the GIL. Additionally, to suspend the first thread’s execution, the second thread sends a signal. This helps to avoid thrashing. This check is not available in Python 2.

 

 

Missing Bits in the New GIL

While the new GIL does address issue such as thrashing, it still has some areas of improvement:

Waiting for time out

Waiting for timeout can make the I/O response slow. Especially, when there are multiple, recurrent I/O operations. I/O-bound Python programs take considerable time to add a GIL, followed by a time out. This happens after every I/O operation, and before the next input/output is ready.

Unfair GIL acquiring

As seen below, the thread that makes the GIL drop request is not the one that gets the GIL. This type of situation can reduce the performance of I/O, where response time is critical.

 

 

Prioritizing threads

There is a need for the GIL to distinguish between CPU-bound and I/O-bound threads and then assign priorities. High priority threads must be able to immediately preempt low priority threads. This will help improve the response time considerably.  This issue has already been resolved in operating systems. Operating systems use timeout to automatically adjust task priorities. If a thread is preempted by a timeout, it is penalized with a low priority. Conversely, if a thread suspends early, it is rewarded with raised priority. Incorporating this in Python will help improve thread performance.

 

Written by :

He is a Backend Engineer for the Machine Learning team at CloudSEK. He is responsible for developing stable, scalable infrastructure for Machine learning models, and optimizing its performance.

Deepanjli is CloudSEK's Lead Technical Content Writer and Editor. She is a pen wielding pedant with an insatiable appetite for books, Sudoku, and epistemology. She works on any and all content at CloudSEK, which includes blogs, reports, product documentation, and everything in between.