Graceful server restart with Go

December 19, 2014 - 10 min read
Graceful server restart with Go

Go has been designed as a backend language and is mostly used as such. Servers are the most common type of software produced with it. The question I’m going to answer here is: how to cleanly upgrade a running server?

Goals:

  • Do not close any of the existing connections: for instance, we don’t want to cut down any running deployment. However we want to be able to upgrade our services whenever we want without any constraint.
  • The socket should always be available for the users: if the socket is unavailable at any moment some user may get a ‘connection refused’ message which is not acceptable.
  • The new version of the process should be started and should replace the old one.

Principle

In UNIX-based operating systems, the common way to interact with long running processes is the signals.

  • SIGTERM: Request a process to stop gracefully
  • SIGHUP: Process restart/reload (example: nginx, sshd, apache)

Once a SIGHUP signal is received, there are several steps to restart the process gracefully:

  1. The server stops accepting new connections, but the socket is kept opened.
  2. The new version of the process is started.
  3. The socket is ‘given’ to the new process which will start accepting new connections.
  4. Once the old process has finished serving its client, the process has to stop.

Stop accepting connections

Servers have this in common: they contain an infinite loop accepting connections:

for {
conn, err := listener.Accept()
// Handle connection
}

To break this loop the easiest way is to set a timeout on the listener, when listener. SetTimeout(time.Now()) is called, the listener.Accept() will instantly return a timeout error you can catch and handle.

for {
conn, err := listener.Accept()
if err != nil {
if nerr, ok := err.(net.Err); ok && nerr.Timeout() {
fmt.Println(“Stop accepting connections”)
return
}
}
}

It is important to get that there is a difference between this operation and closing the listener. In this case, the process still listen on a port for example, but connections are queued by the network stack of the operating system, waiting for a process to accept them.

Start the new version of the process

Go provides a ForkExec primitive to spawn a new process. (It doesn’t allow to only fork btw, cf Is it safe to fork() a Golang process?) You can share some pieces of information with this new process, like file descriptors or your environment.

execSpec := &syscall.ProcAttr{
Env: os.Environ(),
Files: []uintptr{os.Stdin.Fd(), os.Stdout.Fd(), os.Stderr.Fd()},
}
fork, err := syscall.ForkExec(os.Args[0], os.Args, execSpec)
[…]

You can see that the process starts a new version of itself with exactly the same argument os.Args

Send socket to child process and recover it

As you’ve seen just before, you can pass file descriptors to your new process, and with a bit of UNIX magic (everything is a file), we can send the socket to the new process and it will be able to use it and to accepts the waiting and future connections.

But the fork-execed process should know that it has to get its socket from a file and not building a new one (which would be already used by the way, as we haven’t closed the existing listener). You can do it anyway you want, the most common is through the environment or with a command line flag.

listenerFile, err := listener.File()
if err != nil {
log.Fatalln("Fail to get socket file descriptor:", err)
}
listenerFd := listenerFile.Fd()

// Set a flag for the new process start process
os.Setenv("_GRACEFUL_RESTART", "true")

execSpec := &syscall.ProcAttr{
Env: os.Environ(),
Files: []uintptr{os.Stdin.Fd(), os.Stdout.Fd(), os.Stderr.Fd(), listenerFd},
}
// Fork exec the new version of your server
fork, err := syscall.ForkExec(os.Args[0], os.Args, execSpec)

Then at the beginning of the program:

var listener *net.TCPListener
if os.Getenv("_GRACEFUL_RESTART") == "true" { // The second argument should be the filename of the file descriptor // however, a socker is not a named file but we should fit the interface // of the os.NewFile function.
file := os.NewFile(3, "")
listener, err := net.FileListener(file)
if err != nil {
// handle
}
var bool ok
listener, ok = listener.(*net.TCPListener)
if !ok {
// handle
}
} else {
listener, err = newListenerWithPort(12345)
}

The file descriptor has not been chosen randomly the file descriptor 3, it is because in the slice of uintptr which has been sent to the fork, the listener got the index 3. Be careful with shadow declaration mistakes.

Last step, wait for the old server connections to stop

At that point, that’s it, we have passed the buck to another process which is now correctly running, the last operation for the old server is to wait that the connections are closed. There is a simple wait to implement it with go, thanks to the sync.WaitGroup structure provided in the standard library.

Each time a connection is accepted, 1 is added to the WaitGroup, then, we decrease the counter when it’s done:

for {
conn, err := listener.Accept() wg.Add(1)
go func() {
handle(conn)
wg.Done()
}()
}

As a result to wait the end of the connections, you just have to wg.Wait(), as there is no new connection, we are waiting that the wg.Done() has been called for all the running handlers.

Bonus: don’t wait infinitely but a given amount of time

With a time.Timer, it’s really straightforward to implement this:

timeout := time.NewTimer(time.Minute)
wait := make(chan struct{})
go func() {
wg.Wait()
wait <- struct{}{}
}()

select {
case <-timeout.C:
return WaitTimeoutError
case <-wait:
return nil
}

Complete example

Most of the code snippets in this article have been extracted from the complete example I’ve developped to illustrate this blog post: https://github.com/Scalingo/go-graceful-restart-example

Conclusion

Using ForkExec with socket passing is a really efficient way to upgrade a process without disturbing connections, at the maximum, new clients will wait a few milliseconds, time for the new server to boot up and get back the socket, but this amount of time is really short.

This article was part of our #FridayTechnical serie, there won’t be any article next week, merry Christmas everybody.

Links:

— Léo Unbekandt CTO @ Scalingo

Share the article
Léo Unbekandt
Léo Unbekandt
Léo is the founder and CTO of Scalingo. He studied in France as a cloud engineer (ENSIIE) and in England (Cranfield University). He is in charge of Scalingo's technical development and he manages Scalingo's tech team.

Try Scalingo for free

30-day free trial / No credit card required / Hosted in Europe