Go has been designed as a backend language and is mostly used as such. Servers are the most common type of software produced with it. The question I’m going to answer here is: how to cleanly upgrade a running server?
Goals:
In UNIX-based operating systems, the common way to interact with long running processes is the signals.
Once a SIGHUP signal is received, there are several steps to restart the process gracefully:
Servers have this in common: they contain an infinite loop accepting connections:
for {
conn, err := listener.Accept()
// Handle connection
}
To break this loop the easiest way is to set a timeout on the listener, when listener. SetTimeout(time.Now())
is called, the listener.Accept()
will instantly return a timeout error you can catch and handle.
for {
conn, err := listener.Accept()
if err != nil {
if nerr, ok := err.(net.Err); ok && nerr.Timeout() {
fmt.Println(“Stop accepting connections”)
return
}
}
}
It is important to get that there is a difference between this operation and closing the listener. In this case, the process still listen on a port for example, but connections are queued by the network stack of the operating system, waiting for a process to accept them.
Go provides a ForkExec primitive to spawn a new process. (It doesn’t allow to only fork btw, cf Is it safe to fork() a Golang process?) You can share some pieces of information with this new process, like file descriptors or your environment.
execSpec := &syscall.ProcAttr{
Env: os.Environ(),
Files: []uintptr{os.Stdin.Fd(), os.Stdout.Fd(), os.Stderr.Fd()},
}
fork, err := syscall.ForkExec(os.Args[0], os.Args, execSpec)
[…]
You can see that the process starts a new version of itself with exactly the same argument os.Args
As you’ve seen just before, you can pass file descriptors to your new process, and with a bit of UNIX magic (everything is a file), we can send the socket to the new process and it will be able to use it and to accepts the waiting and future connections.
But the fork-execed process should know that it has to get its socket from a file and not building a new one (which would be already used by the way, as we haven’t closed the existing listener). You can do it anyway you want, the most common is through the environment or with a command line flag.
listenerFile, err := listener.File()
if err != nil {
log.Fatalln("Fail to get socket file descriptor:", err)
}
listenerFd := listenerFile.Fd()
// Set a flag for the new process start process
os.Setenv("_GRACEFUL_RESTART", "true")
execSpec := &syscall.ProcAttr{
Env: os.Environ(),
Files: []uintptr{os.Stdin.Fd(), os.Stdout.Fd(), os.Stderr.Fd(), listenerFd},
}
// Fork exec the new version of your server
fork, err := syscall.ForkExec(os.Args[0], os.Args, execSpec)
Then at the beginning of the program:
var listener *net.TCPListener
if os.Getenv("_GRACEFUL_RESTART") == "true" { // The second argument should be the filename of the file descriptor // however, a socker is not a named file but we should fit the interface // of the os.NewFile function.
file := os.NewFile(3, "")
listener, err := net.FileListener(file)
if err != nil {
// handle
}
var bool ok
listener, ok = listener.(*net.TCPListener)
if !ok {
// handle
}
} else {
listener, err = newListenerWithPort(12345)
}
The file descriptor has not been chosen randomly the file descriptor 3, it is because in the slice of uintptr which has been sent to the fork, the listener got the index 3. Be careful with shadow declaration mistakes.
At that point, that’s it, we have passed the buck to another process which is now correctly running, the last operation for the old server is to wait that the connections are closed. There is a simple wait to implement it with go, thanks to the sync.WaitGroup structure provided in the standard library.
Each time a connection is accepted, 1 is added to the WaitGroup, then, we decrease the counter when it’s done:
for {
conn, err := listener.Accept() wg.Add(1)
go func() {
handle(conn)
wg.Done()
}()
}
As a result to wait the end of the connections, you just have to wg.Wait(), as there is no new connection, we are waiting that the wg.Done() has been called for all the running handlers.
With a time.Timer
, it’s really straightforward to implement this:
timeout := time.NewTimer(time.Minute)
wait := make(chan struct{})
go func() {
wg.Wait()
wait <- struct{}{}
}()
select {
case <-timeout.C:
return WaitTimeoutError
case <-wait:
return nil
}
Most of the code snippets in this article have been extracted from the complete example I’ve developped to illustrate this blog post: https://github.com/Scalingo/go-graceful-restart-example
Using ForkExec with socket passing is a really efficient way to upgrade a process without disturbing connections, at the maximum, new clients will wait a few milliseconds, time for the new server to boot up and get back the socket, but this amount of time is really short.
This article was part of our #FridayTechnical serie, there won’t be any article next week, merry Christmas everybody.
Links:
— Léo Unbekandt CTO @ Scalingo
At Scalingo (with our partners) we use trackers on our website.
Some of those are mandatory for the use of our website and can't be refused.
Some others are used to measure our audience as well as to improve our relationship with you or to send you quality content and advertising.