What is the difference between a process and a thread?
What is the technical difference between a process and a thread?
I get the feeling a word like 'process' is overused and there are also hardware and software threads. How about light-weight processes in languages like Erlang? Is there a definitive reason to use one term over the other?
Both processes and threads are independent sequences of execution. The typical difference is that threads (of the same process) run in a shared memory space, while processes run in separate memory spaces.
I'm not sure what "hardware" vs "software" threads you might be referring to. Threads are an operating environment feature, rather than a CPU feature (though the CPU typically has operations that make threads efficient).
Erlang uses the term "process" because it does not expose a shared-memory multiprogramming model. Calling them "threads" would imply that they have shared memory.
Read more... Read less...
Each process provides the resources needed to execute a program. A process has a virtual address space, executable code, open handles to system objects, a security context, a unique process identifier, environment variables, a priority class, minimum and maximum working set sizes, and at least one thread of execution. Each process is started with a single thread, often called the primary thread, but can create additional threads from any of its threads.
A thread is an entity within a process that can be scheduled for execution. All threads of a process share its virtual address space and system resources. In addition, each thread maintains exception handlers, a scheduling priority, thread local storage, a unique thread identifier, and a set of structures the system will use to save the thread context until it is scheduled. The thread context includes the thread's set of machine registers, the kernel stack, a thread environment block, and a user stack in the address space of the thread's process. Threads can also have their own security context, which can be used for impersonating clients.
This information was found on Microsoft Docs here: About Processes and Threads
Microsoft Windows supports preemptive multitasking, which creates the effect of simultaneous execution of multiple threads from multiple processes. On a multiprocessor computer, the system can simultaneously execute as many threads as there are processors on the computer.
- An executing instance of a program is called a process.
- Some operating systems use the term ‘task‘ to refer to a program that is being executed.
- A process is always stored in the main memory also termed as the primary memory or random access memory.
- Therefore, a process is termed as an active entity. It disappears if the machine is rebooted.
- Several process may be associated with a same program.
- On a multiprocessor system, multiple processes can be executed in parallel.
- On a uni-processor system, though true parallelism is not achieved, a process scheduling algorithm is applied and the processor is scheduled to execute each process one at a time yielding an illusion of concurrency.
- Example: Executing multiple instances of the ‘Calculator’ program. Each of the instances are termed as a process.
- A thread is a subset of the process.
- It is termed as a ‘lightweight process’, since it is similar to a real process but executes within the context of a process and shares the same resources allotted to the process by the kernel.
- Usually, a process has only one thread of control – one set of machine instructions executing at a time.
- A process may also be made up of multiple threads of execution that execute instructions concurrently.
- Multiple threads of control can exploit the true parallelism possible on multiprocessor systems.
- On a uni-processor system, a thread scheduling algorithm is applied and the processor is scheduled to run each thread one at a time.
- All the threads running within a process share the same address space, file descriptors, stack and other process related attributes.
- Since the threads of a process share the same memory, synchronizing the access to the shared data within the process gains unprecedented importance.
I borrowed the above info from the Knowledge Quest! blog.
First, let's look at the theoretical aspect. You need to understand what a process is conceptually to understand the difference between a process and a thread and what's shared between them.
We have the following in section 2.2.2 The Classical Thread Model of Modern Operating Systems 3e by Tanenbaum:
The process model is based on two independent concepts: resource grouping and execution. Sometimes it is useful to separate them; this is where threads come in....
One way of looking at a process is that it is a way to group related resources together. A process has an address space containing program text and data, as well as other resources. These resource may include open files, child processes, pending alarms, signal handlers, accounting information, and more. By putting them together in the form of a process, they can be managed more easily. The other concept a process has is a thread of execution, usually shortened to just thread. The thread has a program counter that keeps track of which instruction to execute next. It has registers, which hold its current working variables. It has a stack, which contains the execution history, with one frame for each procedure called but not yet returned from. Although a thread must execute in some process, the thread and its process are different concepts and can be treated separately. Processes are used to group resources together; threads are the entities scheduled for execution on the CPU.
Further down he provides the following table:
Per process items | Per thread items ------------------------------|----------------- Address space | Program counter Global variables | Registers Open files | Stack Child processes | State Pending alarms | Signals and signal handlers | Accounting information |
Let's deal with the hardware multithreading issue. Classically, a CPU would support a single thread of execution, maintaining the thread's state via a single program counter (PC), and set of registers. But what happens when there's a cache miss? It takes a long time to fetch data from main memory, and while that's happening the CPU is just sitting there idle. So someone had the idea to basically have two sets of thread state (PC + registers) so that another thread (maybe in the same process, maybe in a different process) can get work done while the other thread is waiting on main memory. There are multiple names and implementations of this concept, such as Hyper-threading and simultaneous multithreading (SMT for short).
Now let's look at the software side. There are basically three ways that threads can be implemented on the software side.
- User space threads
- Kernel threads
- A combination of the two
All you need to implement threads is the ability to save the CPU state and maintain multiple stacks, which can in many cases be done in user space. The advantage of user space threads is super fast thread switching since you don't have to trap into the kernel and the ability to schedule your threads the way you like. The biggest drawback is the inability to do blocking I/O (which would block the entire process and all its user threads), which is one of the big reasons we use threads in the first place. Blocking I/O using threads greatly simplifies program design in many cases.
Kernel threads have the advantage of being able to use blocking I/O, in addition to leaving all the scheduling issues to the OS. But each thread switch requires trapping into the kernel which is potentially relatively slow. However, if you're switching threads because of blocked I/O this isn't really an issue since the I/O operation probably trapped you into the kernel already anyway.
Another approach is to combine the two, with multiple kernel threads each having multiple user threads.
So getting back to your question of terminology, you can see that a process and a thread of execution are two different concepts and your choice of which term to use depends on what you're talking about. Regarding the term "light weight process", I don't personally see the point in it since it doesn't really convey what's going on as well as the term "thread of execution".
To explain more with respect to concurrent programming
A process has a self-contained execution environment. A process generally has a complete, private set of basic run-time resources; in particular, each process has its own memory space.
Threads exist within a process — every process has at least one. Threads share the process's resources, including memory and open files. This makes for efficient, but potentially problematic, communication.
An example keeping the average person in mind:
On your computer, open Microsoft Word and a web browser. We call these two processes.
In Microsoft Word, you type something and it gets automatically saved. Now, you have observed editing and saving happens in parallel - editing on one thread and saving on the other thread.
An application consists of one or more processes. A process, in the simplest terms, is an executing program. One or more threads run in the context of the process. A thread is the basic unit to which the operating system allocates processor time. A thread can execute any part of the process code, including parts currently being executed by another thread. A fiber is a unit of execution that must be manually scheduled by the application. Fibers run in the context of the threads that schedule them.
Stolen from here.