Goroutines vs python threads

Golang has gained tremendous popularity in the past few years. Go is designed to be simple and lightweight. In this article we will deep dive into the reasons that make golang a lightweight language.

A quick refresher on some basic topics, skip to the main reading if you already know it,

Compiled vs Interpreted

A compiled language is compiled beforehand and converted into instructions of a target machine. For eg. a + b would be converted directly into ADD instruction in machine code. Hence during runtime, the program just has to execute the machine code. On the other hand, In an interpreted language, the program is executed line by line. So for each statement during runtime the instruction is first converted into a native language(which normally is written in the language of the native machine) and then this native language converts it into a machine code instruction. So an interpreted language has to do more work during runtime compared to compiled languages which makes it slow.

So now the question is if we know interpreted languages are slow why use them ?

It’s because interpreted languages are dynamically typed and compiled languages are statically typed. Statically typed means the type of a variable is known at the compile time itself. For eg. In Java, C++ and Go, the data type of a variable whether it’s int, float, etc. is defined while declaring it. Hence if a variable is of type int, assigning a string to it would give an error during compile time.

Whereas in dynamically typed languages like python, javascript, the type of a variable is unknown until a value is assigned to a variable. For eg. a variable test in python can be assigned a string or int or anything else since the type checking is done during runtime.interpreted languages are suitable for

  • Productive development: fast web development (PHP, Javascript) or for prototyping.
  • Cross-Platform; for example JavaScript is supported in every browser (including mobile browsers).

What is a Process?

By definition a process is a program in execution. In simple terms whenever a program is executed, the operating system allocates some portion of the main memory for the program and when this program is loaded in the allocated memory, it becomes a process.

So is a process essentially just the program executed in the main memory?

A process consists of four sections

  • Text (Instructions) – A compiled program code, i.e. set of instructions that needs to be executed
  • Data – Global and static variables (int calls in the above program)
  • Heap – Dynamic memory allocation and deletion during runtime.
  • Stack – Used to Store local variables

Each process has some memory allocated for the kernel which is actually responsible for picking up the instructions one by one making a system call and updating the local/global variables.

In the diagram above notice the scope of access of the user Process and kernel process. A kernel process can make modifications to any memory within the process whereas the user process cannot have access to kernel space.

Okay now we have a good understanding of a process. Moving forward let’s try to understand thread now.

Can you think of a drawback/limitation of a process?

Let’s say I have a single processor machine and I am using google chrome, MS Word, and TeamViewer simultaneously. Will you be able to achieve parallelism with just a single processor. Yes definitely you can but not without the concept of thread. As explained earlier a Processor is just a program in execution, so whatever program is loaded in main memory allocated to the processor. The process will execute one instruction at a time. So if we want to achieve parallelism with a single processor, we have to load the programs of chrome, MS Word, and PowerPoint together in the processor right? Let’s assume we loaded all three programs in the single processor available and also let’s assume each program has 1000 instructions. Now if the processor executes 1000 instruction of chrome first and then picks up the next 1000 instruction of MS Word, we know this is sequential execution of instruction and not parallelism. So how can you achieve parallelism here? Yes, we can take 100 instructions of each program at a time and after completion, the processor can pick up 100 instruction of the next program. In a round-robin format the processor will keep executing the next 100 instructions of each program. Hooray we achieved parallelism. But wait are we missing anything here? Yes surely we are.

After the first iteration of round-robin each program must have executed the first 100 instructions. Now in the next iteration the processor should start the execution from 101 statement. Hence we need a mechanism to store the last execution state before the context switching happens Here comes the role of thread. Thread stores the execution state and resumes execution from that state as soon as the processor is available. This whole story was to make you understand the purpose of a thread, it took me a lot of time to understand this, and if you got confused somewhere in the middle just go back and read this story again.

So What is a thread ?

Thread is simply a separate stream of execution within a process that consists of its own program counter, a stack, and a set of registers. Threads are also known as Lightweight processes.

As shown in the figure above, each program of chrome, MS Word, and TeamViewer will be loaded in a different thread within a process. The OS divides the processor time into time slices, thus a thread cannot be in running state more than the time slice, after which another thread is loaded and executed which creates an illusion of parallelism. Also context switching between threads is faster compared to IPC (Inter-Process communication) between processes.

Every process has a thread table that contains information on all the active threads within the process.

For a multiprocessor machine, spawning a thread within one processor would be inefficient since all the other processors will remain idle. To optimize the use of all the available processors, OS has something called as kernel-level threads. Instead of a thread table for each process kernel has a thread table that keeps track of all the threads in a system. In addition, the kernel also maintains the traditional process table to keep track of the processes. Since the kernel has full knowledge of all threads, Scheduler may decide to give more time to a process having a large number of threads than process having a small number of threads.

Kernel level threads are good for applications that frequently block, since the kernel has all the control, in case of a blocking thread, the kernel allocates the time slice of execution to another non-blocking thread hence keeping the parallelism and utilizing the Processor to its maximum potential

What are the disadvantages of kernel level threads ?

Kernel level threads are slower since the kernel has to manage and schedule threads as well as processes, it requires a full thread control block(TCB) of each thread to maintain information about the thread. As a result there is a significant overhead and increased complexity.

Although kernel-level threads make concurrency cheaper compared to processes. However for fine-grained concurrency, kernel threads still suffer from too many overheads. Thread operations require system calls, context switching which makes it slow. Ideally thread operations should be as fast as a normal simple statement execution (Procedure calls). This is where User-level threads come to the rescue.

User level Threads

User-level threads are managed entirely by the user-level libraries during the runtime. The Kernel knows nothing about the user-level threads. For the kernel, everything is single-threaded processes.

So what’s the difference between goroutines and python threads?

Goroutines are user-level threads and managed entirely by go library whereas python threads are actual kernel threads. Threads in python by default allocate 8Mb of memory space even if the program to be executed requires just a few Kbs of memory which is the major drawback of using python threads. Also since the python threads are kernel threads the context switching is also slow compared to user-level threads. Golang creates user-level threads i.e. goroutines with initial memory allocation of 4Kb which can dynamically increase as the program needs memory. This dynamic allocation of memory and management of user-level threads by the golang library itself makes a huge difference in performance.


User-level threads are lightweight and since golang uses user-level threads, golang is a lightweight language. Also Go is statically typed compiled language. The program is pre-compiled into bytecode which improves the performance during runtime. On the contrary, Python is dynamically typed interpreted language and uses kernel-level threads.

 387 total views,  1 views today

How did you like the article?