Socket readiness without \Device\Afd

Page content

Recently, I’be been exploring socket readiness notifications on Windows using the \Device\Afd interface. My initial forays into this were from a Linux epoll direction as the use of the \Device\Afd API provides a similar interface to the epoll API and makes it possible to build something almost the same on Windows which can help when writing cross platform code.

I’d got to the point where I had a simple client and server working nicely and had taken a pause before the next step which was to add support for multiple sockets. Due to the way that epoll works, and the way that all of the \Device\Afd code that I’ve looked at so far worked, I was planning on having a collection of socket objects that could be polled when needed for events. This would mean building and updating collections of pollable sockets each time a new socket needed to monitor some events. I had a rough design sketched out in my head but hadn’t had the time to make it concrete.

Another way

Then Brad House left a comment about how he was using \Device\Afd in c-ares, an asynchronous DNS name resoution system. His comment was interesting as he said “It takes a slightly different approach, like not opening \Device\Afd at all.”

Looking at the c-ares code it appears to allow us to avoid the whole ‘build a set of socket descriptors and events and poll them’ problem that had stalled me and, instead, allows mixing of a standard Windows IOCP sytle design with the readiness polling of \Device\Afd.

The core difference is that rather than polling a single \Device\Afd handle with a set of multiple sockets in a single AFD_POLL_INFO structure you set up the poll using the socket itself.

Removing code

The good thing about discovering this approach now, rather than at the start, is that it’s very easy to play around with this new way simply by deleting code from my previous explorations. All of the code dealing with opening \Device\Afd can go. Instead we simply create a normal overlapped socket and associate it with an IOCP as normal. You then poll the socket itself rather than using an explicit handle to \Device\Afd.

You end up with code like this, which you can compare to the code in the original article which opens \Device\Afd explicitly.

int main()
{
   InitialiseWinsock();

   // Create an IOCP for notifications...

   HANDLE hIOCP = CreateIOCP();

   // Create a stream socket

   SOCKET s = WSASocket(AF_INET, SOCK_STREAM, IPPROTO_TCP, nullptr, 0, WSA_FLAG_OVERLAPPED);

   if (s == INVALID_SOCKET)
   {
      ErrorExit("socket");
   }

   // Set it as non-blocking

   unsigned long one = 1;

   if (0 != ioctlsocket(s, (long) FIONBIO, &one))
   {
      ErrorExit("ioctlsocket");
   }


   // Associate the socket with the IOCP...

   if (nullptr == CreateIoCompletionPort(reinterpret_cast<HANDLE>(s), hIOCP, 0, 0))
   {
      ErrorExit("CreateIoCompletionPort");
   }

   if (!SetFileCompletionNotificationModes(reinterpret_cast<HANDLE>(s), FILE_SKIP_SET_EVENT_ON_HANDLE))
   {
      ErrorExit("SetFileCompletionNotificationModes");
   }

   // These are the events that wepoll suggests that AFD exposes...

   constexpr ULONG events = 
      AFD_POLL_RECEIVE |                  // readable
      AFD_POLL_RECEIVE_EXPEDITED |        // out of band
      AFD_POLL_SEND |                     // writable
      AFD_POLL_DISCONNECT |               // client close
      AFD_POLL_ABORT |                    // closed
      AFD_POLL_LOCAL_CLOSE |              // ?
      AFD_POLL_ACCEPT |                   // connection accepted on listening
      AFD_POLL_CONNECT_FAIL;              // outbound connection failed


   // This is information about what we are interested in for the supplied socket.
   // We're polling for one socket, we are interested in the specified events
   // The other stuff is copied from wepoll - needs more investigation

   AFD_POLL_INFO pollInfoIn {};

   pollInfoIn.Exclusive = FALSE;
   pollInfoIn.NumberOfHandles = 1;
   pollInfoIn.Timeout.QuadPart = INT64_MAX;
   pollInfoIn.Handles[0].Handle = reinterpret_cast<HANDLE>(GetBaseSocket(s));
   pollInfoIn.Handles[0].Status = 0;
   pollInfoIn.Handles[0].Events = events;

   // To make it clear that the inbound and outbound poll structures can be different
   // we use a different one...

   // As we'll see below, the status block and the outbound poll info need to stay
   // valid until the event completes...

   AFD_POLL_INFO pollInfoOut {};

   IO_STATUS_BLOCK pollStatusBlock {};

   // kick off the poll

   NTSTATUS status = NtDeviceIoControlFile(
      reinterpret_cast<HANDLE>(s),
      nullptr,
      nullptr,
      &pollStatusBlock,
      &pollStatusBlock,
      IOCTL_AFD_POLL,
      &pollInfoIn,
      sizeof (pollInfoIn),
      &pollInfoOut,
      sizeof(pollInfoOut));

   if (status == 0)
   {
   

Reusing the tests

The initial exploration showed that this approach could work and so the next step was to see how many of the unit tests that I wrote during my “exploratory understanding” phase work with the new design. These tests are all the same as the ones from the Test Driven Understanding article but using the new design. All of them pass, which shows that all of the same functionality is available with this approach which makes it a viable alternative to the “polling a group of sockets” approach…

Code

Full source can be found here on GitHub.

This article refers to the explore_without_device_afd and the socket_without_device_afd code.

This isn’t production code, error handling is simply “panic and run away”.

This code is licensed with the MIT license.

Wrapping up

Being able to poll each socket separately is likely to be quite a powerful design choice. Rather than using \Device\Afd to poll ‘sets’ of sockets, where the poll needs to be interrupted and restarted every time you need to add or remove a socket or event from the set we instead can set up polls for each socket separately. This likely results in the same number of system calls but requires less book-keeping and state-management code. I’ll investigate some more next time when I present a version of the client and server code that uses this approach.

More on AFD