If you have a Golang HTTP service, chances are, you will need to restart it on occasion to upgrade the binary or change some configuration. And if you (like me) have been taking graceful restart for granted because the webserver took care of it, you may find this recipe very handy because with Golang you need to roll your own.
There are actually two problems that need to be solved here. First is the UNIX side of the graceful restart, i.e. the mechanism by which a process can restart itself without closing the listening socket. The second problem is ensuring that all in-progress requests are properly completed or timed-out.
Restarting without closing the socket
- Fork a new process which inherits the listening socket.
- The child performs initialization and starts accepting connections on the socket.
- Immediately after, child sends a signal to the parent causing the parent to stop accepting connecitons and terminate.
Forking a new process
There is more than one way to fork a process using the Golang lib, but
for this particular case
exec.Command is the way to
go. This is because the Cmd struct this function returns has
ExtraFiles member, which specifies open files (in addition to
stdin/err/out) to be inherited by new process.
Here is what this looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
In the above code
netListener is a pointer to
net.Listener listening for HTTP
path variable should contain the path to the new
executable if you’re upgrading (which may be the same as the currently
An important point in the above code is that
of the file descriptor. The duplicated file descriptor will not have
FD_CLOEXEC flag set,
which would cause the file to be closed in the child (not what we want).
You may come across examples that pass the inherited file descriptor
number to the child via a command line argument, but the way
ExtraFiles is implemented makes it unnecessary. The documentation
states that “If non-nil, entry i becomes file descriptor 3+i.” This
means that in the above code snippet, the inherited file descriptor in
the child will always be 3, thus no need to explicitely pass it.
args array contains a
-graceful option: your program will
need some way of informing the child that this is a part of a graceful
restart and the child should re-use the socket rather than try opening
a new one. Another way to do this might be via an environment
Here is part of the program startup sequence
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Signal parent to stop
At this point we’re ready to accept requests, but just before we do that, we need to tell our parent to stop accepting requests and exit, which could be something like this:
1 2 3 4 5 6 7
In-progress requests completion/timeout
For this we will need to keep track of open connections with a sync.WaitGroup. We will need to increment the wait group on every accepted connection and decrement it on every connection close.
At first glance, the Golang standard http package does not provide any hooks to take action on Accept() or Close(), but this is where the interface magic comes to the rescue. (Big thanks and credit to Jeff R. Allen for this post).
Here is an example of a listener which increments a wait group on
every Accept(). First, we “subclass”
net.Listener (you’ll see why we
1 2 3 4 5
Next we “override” the Accept method. (Nevermind
now, it will be introduced later).
1 2 3 4 5 6 7 8 9 10 11
We also need a “constructor”:
1 2 3 4 5 6 7 8 9
The reason the function above starts a goroutine is because this
cannot be done in our
Accept() above since it will block on
gl.Listener.Accept(). The goroutine will unblock it by closing file
Close() method simply sends a
nil to the stop channel for the
above goroutine to do the rest of the work.
1 2 3 4 5 6 7
Finally, this little convenience method extracts the file descriptor
1 2 3 4 5
And, of course we also need a variant of a
net.Conn which decrements the
wait group on
1 2 3 4 5 6 7 8
To start using the above graceful version of the Listener, all we need
is to change the
server.Serve(l) line to:
And there is one more thing. You should avoid hanging connections that the client has no intention of closing (or not this week). It is better to create your server as follows:
1 2 3 4 5