Killing Processes: Let me count the ways

When it comes to terminating processes in Linux, there are numerous ways to get the job done—about 20 different methods, in fact. This versatility is one of the strengths of Linux, giving you a high level of control over your system.

Obligatory Spiderman reference here.

Unlike some other operating systems, Linux doesn't provide many guardrails, meaning it's easy to make choices that can have significant consequences.

In this article, we'll dive into the kill command and explore the different signals you can send to terminate processes. We'll focus on a few key signals: SIGTERM (kill -15), SIGINT (kill -2), and the infamous SIGKILL (kill -9). While kill -9 might seem like the go-to solution for stubborn processes, it can actually do more harm than good if not used carefully. Let's break down why.

So many weapons, so many choices.

We're going to focus on specifically INT, KILL, and TERM. Now, the one you might know best is kill -9.

You've probably reached for this when a process is stuck and won't complete and won't terminate. So you reach for fzf (see video below) and break out the sledgehammer.

Fzf and kill

What's the big deal?

So, to understand why kill -9 can be dangerous, we need to understand the implications of using the -9 signal.

Let's imagine that we're running a Rails server.

Rails server!

Behind the scenes, a pid or process ID file is created. This file helps us know which process is associated with this server and also lets us know that it's still running. Typically, if it exists, it's running.

For a Rails server, that file is located under at tmp/pids/server.pid relative to the application.

Look at that pid

So, let's break out that sledgehammer (using the fzf strategy) and see what happens! Sometimes, the best way to learn is to break stuff after all.

kill 'em all

With a couple of presses of the enter key, it's gone.

Or is it?
Server has been killed

However, since we used our sledgehammer (the -9 signal), the server.pid wasn't removed! Now, with Rails, it's intelligent enough to know that the pid isn't actually running and will update the file with the new pid.

Not every server will do this. Some might argue this isn't a good thing anyway to overlook the existence of a server.pid tied to a non-running pid.

So what else can we use?

SIGTERM (kill -15)

If kill -9 is a sledgehammer, kill -15 is the engineering manager who says to a developer, "hey, stop all your work on that story, commit your code, and switch to this other one."

I'm sure that's never happened to any of you.

SIGINT (kill -2)

This one is a bit different than both -15 and -9. Imagine you're the devops person performing a deployment. In the last minute, UAT comes rolling in saying that a bug was found late in the cycle.

Stop the deployment now! This is why we never deploy on a Friday!

In this situation, the current process needs to be stopped. The process still has the opportunity to handle this interruption in a clean fashion, but make no mistakes — it needs to stop.

What else can go wrong if I use kill -9?

Using kill -9 may seem like a quick fix to terminate a stubborn process, but it comes with significant risks. Here are some potential issues you may encounter:

  1. A child process can be left orphaned
    1. If a process spawns multiple children processes, it is responsible for cleaning things up when it stops. Using -9 doesn't allow for this.
  2. Files could be improperly closed
    1. This can result in corrupt files or the loss of important information. Think of a database that is left in an inconsistent state. Bye-bye ACID.
  3. System Stability
    1. The overall system is no longer in a "known state". Server PIDs for processes that aren't actually running or children processes that aren't expected to be there but are. System resources are drained unnecessarily...

Takeaway

Although a lot of the Linux commands leave something to be desired (naming is hard!), kill should help you realize the gravity of using it. You can make more informed decisions that safeguard system integrity and data reliability by recognizing the potential risks—like orphaned processes, data corruption, and the violation of crucial database properties like ACID.