Update (Apr 2015): Florian von Bock has turned what is described in this article into a nice Go package called endless.
If you have a Golang HTTP service, chances are, you will need to restart it on occasion to upgrade the binary or change some configuration. And if you (like me) have been taking graceful restart for granted because the webserver took care of it, you may find this recipe very handy because with Golang you need to roll your own.
There are actually two problems that need to be solved here. First is the UNIX side of the graceful restart, i.e. the mechanism by which a process can restart itself without closing the listening socket. The second problem is ensuring that all in-progress requests are properly completed or timed-out.
Restarting without closing the socket
- Fork a new process which inherits the listening socket.
- The child performs initialization and starts accepting connections on the socket.
- Immediately after, child sends a signal to the parent causing the parent to stop accepting connecitons and terminate.
Forking a new process
There is more than one way to fork a process using the Golang lib, but
for this particular case
exec.Command is the way to
go. This is because the Cmd struct this function returns has
this ExtraFiles
member, which specifies open files (in addition to
stdin/err/out) to be inherited by new process.
Here is what this looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
In the above code netListener
is a pointer to
net.Listener listening for HTTP
requests. The path
variable should contain the path to the new
executable if you’re upgrading (which may be the same as the currently
running one).
An important point in the above code is that netListener.File()
returns a
dup(2)
of the file descriptor. The duplicated file descriptor will not have
the FD_CLOEXEC
flag set,
which would cause the file to be closed in the child (not what we want).
You may come across examples that pass the inherited file descriptor
number to the child via a command line argument, but the way
ExtraFiles
is implemented makes it unnecessary. The documentation
states that “If non-nil, entry i becomes file descriptor 3+i.” This
means that in the above code snippet, the inherited file descriptor in
the child will always be 3, thus no need to explicitely pass it.
Finally, args
array contains a -graceful
option: your program will
need some way of informing the child that this is a part of a graceful
restart and the child should re-use the socket rather than try opening
a new one. Another way to do this might be via an environment
variable.
Child initialization
Here is part of the program startup sequence
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Signal parent to stop
At this point we’re ready to accept requests, but just before we do that, we need to tell our parent to stop accepting requests and exit, which could be something like this:
1 2 3 4 5 6 7 |
|
In-progress requests completion/timeout
For this we will need to keep track of open connections with a sync.WaitGroup. We will need to increment the wait group on every accepted connection and decrement it on every connection close.
1
|
|
At first glance, the Golang standard http package does not provide any hooks to take action on Accept() or Close(), but this is where the interface magic comes to the rescue. (Big thanks and credit to Jeff R. Allen for this post).
Here is an example of a listener which increments a wait group on
every Accept(). First, we “subclass” net.Listener
(you’ll see why we
need stop
and stopped
below):
1 2 3 4 5 |
|
Next we “override” the Accept method. (Nevermind gracefulConn
for
now, it will be introduced later).
1 2 3 4 5 6 7 8 9 10 11 |
|
We also need a “constructor”:
1 2 3 4 5 6 7 8 9 |
|
The reason the function above starts a goroutine is because this
cannot be done in our Accept()
above since it will block on
gl.Listener.Accept()
. The goroutine will unblock it by closing file
descriptor.
Our Close()
method simply sends a nil
to the stop channel for the
above goroutine to do the rest of the work.
1 2 3 4 5 6 7 |
|
Finally, this little convenience method extracts the file descriptor
from the net.TCPListener
.
1 2 3 4 5 |
|
And, of course we also need a variant of a
net.Conn
which decrements the
wait group on Close()
:
1 2 3 4 5 6 7 8 |
|
To start using the above graceful version of the Listener, all we need
is to change the server.Serve(l)
line to:
1 2 |
|
And there is one more thing. You should avoid hanging connections that the client has no intention of closing (or not this week). It is better to create your server as follows:
1 2 3 4 5 |
|