Adventures with \Device\Afd
I’ve been playing around with Rust recently and whilst investigating asynchronous programming in Rust I was looking at Tokio, an async runtime. From there I started looking at Mio, the cross-platform, low-level, I/O code that Tokio uses.
For Windows platforms Mio uses wepoll,
which is a Windows implementation of the Linux epoll API on
Windows and is based on the code that is used by libuv for Node.js.
This uses networking code that is NOT your standard high-performance Windows networking code using I/O completion ports and
instead uses the ‘sparsely documented’ \Device\Afd
interface that lies below the Winsock2 layer.
I’ve played with libuv before and it was never to my liking, but after some random
wandering I came across a very good and recent description of why
wepoll works in the way it does and why the \Device\Afd
interface
might be worth looking at for async work.
I’ve started to boil the \Device\Afd
code down to the bare essentials so that I can understand what’s going
on without having to continually ignore other people’s coding styles and the APIs that they have built upon it.
The code is here on GitHub and I expect that this will develop into a series of articles that explore how best to take advantage of this approach.
But first you really should go and read “notgull”’s piece on \Device\Afd
Overview
There’s a ‘sparsely documented’ interface to the Windows networking stack that is accessed via the \Device\Afd
device driver’s
ioctl interface and which can provide a “socket readiness” polling interface that promises to be as efficient as the normal
completion based interface. Whilst it’s possible to build readiness polling using the normal completion-based API
it’s harder and some things simply aren’t possible. When working on the cross-platform code for
The Server Framework I ended up with a hybrid approach that used the
epoll API to drive a completion based system above it which
was compatible with the ‘standard’ IOCP system on Windows but being able to do either would be useful and, if nothing
else, an interesting diversion.
Please bear in mind that this stuff is new to me (even though the approach has been around for years), I’m still finding my way, the docs are very sparse and I’m working from other people’s understanding of the API and so much of this is based on assumption and intuition. Please feel free to tell me when I get things wrong!
Cruft-free programming
I think many programmers are happiest working with their own code, or at least, code that conforms to
their own idea of ‘good’. Due to the variety of requirements, languages and platforms it’s a wonder any
code is ever reused. I often find
it far easier to understand code if I pull apart some reference material and rebuild it in “the one true way”,
of course that’s unlikely to meet your definition of “the one true way”. With the code here I am looking to
understand how best to use the \Device\Afd
approach to the Windows networking stack and so have taken a look
at wepoll and the Rust polling code
and pulled the stuff I’m interested out and rebuilt it without all of the added complexity. For me, at least,
this makes the resulting code that actually does the work easier to reason about and understand and from there
to build my own abstractions above it so that other people can complain about my version too…
I had a quick look for more authoritative documentation or examples and didn’t find much that I felt like working from. I’m sure there are more authoritative sources out there but I’m not bothering to look for them just yet as the two sources that I am relying on are widely used and therefore reasonably trustworthy; also they’re both clearly Open Source and so free for me to explore and understand.
Opening the Afd device
Both of my sources use NtCreateFile()
to open the Afd
device. There’s some talk on Reddit
that suggests that we might be able to use CreateFile()
but I haven’t yet explored that route. To access NtCreateFile()
we need to pull in winternl.h
and link with ntdll.dll
and we need to work with the
UNICODE_STRING
and
OBJECT_ATTRIBUTES
types.
These types are all slightly more complex than you might be used to when working with the Windows API but
they are easy to wrap up with something a little more user friendly if you want to. For now, and for simplicity
of the code I’m using them directly and avoiding the macros that wepoll
uses.
Full source can be found here on GitHub.
This isn’t production code, error handling is simply “panic and run away”.
This code is licensed with the MIT license.
Opening the device is as simple as this:
// Arbitrary name in the Afd namespace
static LPCWSTR deviceName = L"\\Device\\Afd\\explore";
const USHORT lengthInBytes = static_cast<USHORT>(wcslen(deviceName) * sizeof(wchar_t));
static const UNICODE_STRING deviceNameUString {
lengthInBytes,
lengthInBytes,
const_cast<LPWSTR>(deviceName)
};
static OBJECT_ATTRIBUTES attributes = {
sizeof(OBJECT_ATTRIBUTES),
nullptr,
const_cast<UNICODE_STRING *>(&deviceNameUString),
0,
nullptr,
nullptr
};
HANDLE hAFD;
IO_STATUS_BLOCK statusBlock {};
NTSTATUS status = NtCreateFile(
&hAFD,
SYNCHRONIZE,
&attributes,
&createStatusBlock,
nullptr,
0,
FILE_SHARE_READ | FILE_SHARE_WRITE,
FILE_OPEN,
0,
nullptr,
0);
if (status == 0)
{
Associating with an IOCP
We then associate the Afd
handle with an I/O completion port that we can poll for results to our
polling requests.
// Create an IOCP for notifications...
HANDLE hIOCP = CreateIOCP();
// Associate the AFD handle with the IOCP...
if (nullptr == CreateIoCompletionPort(hAFD, hIOCP, 0, 0))
{
ErrorExit("CreateIoCompletionPort");
}
if (!SetFileCompletionNotificationModes(hAFD, FILE_SKIP_SET_EVENT_ON_HANDLE))
{
ErrorExit("SetFileCompletionNotificationModes");
}
I would expect that we can also use FILE_SKIP_COMPLETION_PORT_ON_SUCCESS
here, but I’ve not
tested it yet.
In the real world the code above would be done once to set the system up, the next pieces of code represent what happens when we have a new socket that we want to poll.
Polling for events on a socket
wepoll suggests that the following events are available to us.
constexpr ULONG events =
AFD_POLL_RECEIVE | // readable
AFD_POLL_RECEIVE_EXPEDITED | // out of band
AFD_POLL_SEND | // writable
AFD_POLL_DISCONNECT | // client close
AFD_POLL_ABORT | // closed
AFD_POLL_LOCAL_CLOSE | // ?
AFD_POLL_ACCEPT | // connection accepted on listening
AFD_POLL_CONNECT_FAIL; // outbound connection failed
We register an interest in an event on a socket like this:
// This is information about what we are interested in for the supplied socket.
// We're polling for one socket, we are interested in the specified events
// The other stuff is copied from wepoll - needs more investigation
AFD_POLL_INFO pollInfoIn {};
pollInfoIn.Exclusive = FALSE;
pollInfoIn.NumberOfHandles = 1;
pollInfoIn.Timeout.QuadPart = INT64_MAX;
pollInfoIn.Handles[0].Handle = reinterpret_cast<HANDLE>(GetBaseSocket(s));
pollInfoIn.Handles[0].Status = 0;
pollInfoIn.Handles[0].Events = events;
// To make it clear that the inbound and outbound poll structures can be different
// we use a different one...
// As we'll see below, the status block and the outbound poll info need to stay
// valid until the event completes...
AFD_POLL_INFO pollInfoOut {};
IO_STATUS_BLOCK pollStatusBlock {};
// kick off the poll
status = NtDeviceIoControlFile(
hAFD,
nullptr,
nullptr,
&pollStatusBlock,
&pollStatusBlock,
IOCTL_AFD_POLL,
&pollInfoIn,
sizeof (pollInfoIn),
&pollInfoOut,
sizeof(pollInfoOut));
if (status == 0)
{
// It's unlikely to complete straight away here as we haven't done anything with
// the socket, but I expect that once the socket is connected we could get immediate
// completions and we could, possibly, set 'FILE_SKIP_COMPLETION_PORT_ON_SUCCESS` for the
// AFD association...
cout << "success" << endl;
}
else if (status == STATUS_PENDING)
{
For people who have some experience of IOCP things are starting to look a little familiar;
here we have an “operation” which specifies two pieces of “per operation data”. The first is
the status block, which is returned to us in our call to GetQueuedCompletionStatus()
when the poll completes, or, presumably, is cancelled. The second is the pollInfoOut
structure. This isn’t explicitly returned to us when the poll completes and so in real code
we will likely include both the status block and the info structure in a larger structure and
then navigate from the status block to our larger structure in much the same way that we’re used
to navigating from an OVERLAPPED
to an extended overlapped structure with normal IOCP designs.
Receiving event notifications
The simplest event we can generate here is a connection failure event, we do that by trying to
connect our socket to a port that isn’t listening and, eventually, we will get a completion from
a call to GetQueuedCompletionStatus()
on our IOCP.
int result = connect(s, (struct sockaddr*) &addr, sizeof addr);
if (result == SOCKET_ERROR)
{
const DWORD lastError = WSAGetLastError();
if (lastError == WSAEWOULDBLOCK)
{
cout << "connect would block" << endl;
if (!::GetQueuedCompletionStatus(hIOCP, &numberOfBytes, &completionKey, &pOverlapped, INFINITE))
{
ErrorExit("GetQueuedCompletionStatus");
}
cout << "got completion" << endl;
IO_STATUS_BLOCK *pStatus = reinterpret_cast<IO_STATUS_BLOCK *>(pOverlapped);
At this point we have the status block that we used when we started the poll and the
pollInfoOut
structure has been updated to hold details of the poll results.
In this example the pollInfoOut.Handles[0].Events
member now holds just AFD_POLL_CONNECT_FAIL
.
Wrapping up
This simple example does just what it needs to demonstrate how the \Device\Afd
means
of accessing the Windows networking stack works. In later articles we’ll build something a little
more useful and, eventually, we can start to measure performance and compare with other
methods of network I/O on Windows.
Code is here
Full source can be found here on GitHub.
This isn’t production code, error handling is simply “panic and run away”.
This code is licensed with the MIT license.
More on AFD
- Adventures with
\\Device\\Afd
- this post - Test Driven Understanding
- Test Driven Design
- A simple client
- A simple server
- More \Device\Afd goodness
- Socket readiness without
\\Device\\Afd
- A multi-connection AFD-based echo server