Internals of NDIS driver for VirtIO based network adapter

From KVM
Revision as of 10:02, 27 July 2009 by Yan (talk | contribs)

Internals of NDIS driver for VIRTIO based network adapter


This document contains implementation notes of Windows network adapter driver of VirtIO network device.

Abbreviations in the document

  • LSO, a.k.a. GSO, a.k.a. TSO – Large, Global, Transmit segment offload – the same thing in context of Windows driver.
  • OOB – out-of-band packet data; set of structures, associated with packet
  • SG – scatter-gather
  • MTU – maximal transfer unit
  • MSS – maximal segment size
  • CS – checksum
  • MDL – memory descriptor list
  • WL – waiting list
  • DUT – device under test


NDIS driver features

Basic networking operations

The NetKVM driver supports non-serialized data sending and receiving. Multiple send operations are supported by driver and not required to be serialized by NDIS (Logo requirements). For send operation scatter-gather mode is a default, it can be disabled by configuration to force copy mode to preallocated buffers. In receive operations the driver can indicate up to NDIS as many packets as VirtIO device can receive without waiting for NDIS to free receive buffers. The default MTU reported by driver to NDIS is 1514 bytes, i.e. NDIS sends without LSO packets up to MTU, the driver can not indicate reception of packets bigger than MTU. MTU can be changed through the device properties in device manager. For bidirectional data channel to/from device, the driver initializes 2 VirtIO queues (transmit and receive). VirtIO queues contain memory blocks for SG entry only (physical address + size), all the data buffers, where the physical address points to, are not related to VirtIO. Buffers, required for VirtIO headers and network data (when necessary) and OS-specific object, required for data indication, are allocated by the driver additionally during initialization.

Checksum offload

NDIS driver can declare following types of TX checksum offload

  • IP checksum. NDIS does not initialize IP header checksum, device/driver fill it.
  • TCP checksum. NDIS initializes TCP header checksum with “pseudo header checksum” value (pseudo header does not exist in the packet, checksum calculated on virtual structure), the device/driver finish it
  • Support IP options and TCP options when calculating checksum.
  • UDP checksum – exactly as TCP without options.
  • IP and TCP checksum for IPV6.

NDIS driver may implement also support of RX checksum offload (ability to verify IP/TCP checksum in incoming packets) – this feature is not supported in NetKVM. The NetKVM driver has implementation of all the CS offloads for IPV4, it works functionally. It is very helpful during development and diagnostic; some of these mechanisms used for LSO also. But the implementation does not pass corner cases of NDIS tests and disabled in the configuration by default.

Large segment offload

In LSO operation the NDIS provides packet that usually (not always) bigger than MTU; driver/device shall divide it into several smaller fragments before sending to other side. Each fragment, except of last one must contain MSS bytes of TCP data. During fragmentation, the device shall properly populate IP and TCP headers including IP and most of TCP options and fill all the necessary checksums. The NetKVM driver implements LSO offload for IPV4 with maximal packet size of almost 64K before offloading. Note that at least in current implementation the LSO requires scatter-gather for operation, as packets bigger than MTU can not be placed into pre-allocated buffers.

Priority and VLAN tagging

The NetKVM driver on its lower edge supports both Regular Ethernet header of 14 bytes and Ethernet type II header of 18 bytes. The second one includes Priority/VLAN tag (32 bits) with 16-bit field of Priority and VLAN. Ethernet type II header can not be indicated up to NDIS; NDIS also does not provide this tag in outgoing packets. Instead Priority and VLAN data reside in OOB block of NDIS packet(s) and populating this tag in outgoing packets is the responsibility of the driver. It is responsible also for removal of this tag from incoming packets (if present) and place Priority data into OOB block. The driver also checks the VLAN value and ignores incoming packets addressed to VLAN different than one configured for the specific instance of the driver.

Connect detection

Typically, the VirtIO network device driver always indicates connection to the network. During manual WHQL tests the test operator is required to remove the network cable from the socket. The VirtIO network device implementation in QEMU indicates up to the driver connection status, which allows the driver to indicate up to NDIS connect/disconnect events. When disconnected, the driver suspends its send operation and fails to send packets.

Filtering of incoming packets

Filtering is required feature of network adapter under Windows. The NetKVM driver maintains NDIS-managed filter mask, controlling which packets the driver shall indicate and which shall drop. This filter can allow one or more of:

  • Unicast packets (to device own MAC address)
  • Multicast packets (to one of multicast addresses configured from NDIS)
  • All multicast packets (to any multicast addresses)
  • Broadcast packets
  • All packets

For each incoming packet the driver analyses the destination MAC address and takes decision based on current filter mask. The buffer of the packet is available – it is one of buffers allocated by the driver during initialization.

Statistics

NDIS miniport required to support statistics for send and receive operations: number of successfully/unsuccessfully send/received bytes/packets per kind of packet (unicast, multicast, broadcast). Main consumer of this statistic information is NDIS test.

Sources files

Common files

Files under Common Functionality
ParaNdis-Common.c

ndis56common.h ethernetutils.h

All the common mechanisms and flow: system-independent initialization and cleanup, receive and transmit processing, interactions with VirtIO library
ParaNdis-Oid.c Implementation of system-independent part of OID requests
ParaNdis-Debug.c

kdebugprint.h

Debug printouts implementation
sw-offload.c All the procedures related to IP packets parsing and checksum calculation using software
osdep.h Common header for VirtIO
IONetDescriptor.h VirtIO header definition – must be aligned with one used by device (QEMU)
quverp.h Header used for resource generation (visible on file properties)

NDIS5-specific files

Files under WXP Functionality
ParaNdis5-Driver.c Implementation of NDIS5 driver part
ParaNdis5-Impl.c

ParaNdis5.h

NDIS5-specific implementation of procedure required by common part
ParaNdis5-Oid.c NDIS5-specific implementation of OID
netkvm.inf INF file for XP/2003

NDIS6-specific files

Files under WLH Functionality
ParaNdis6-Driver.c Implementation of NDIS5 driver part
ParaNdis6-Impl.c

ParaNdis6.h

NDIS6-specific implementation of procedure required by common part
ParaNdis6-Oid.c NDIS6-specific implementation of OID
netkvm.inf INF file for Vista/2008

Flows and operation

Initialization

Configurable parameters

Upon initialization the driver retrieves configurable parameters from the adapter registry setting. It checks for each one of parameters the validity of retrieved value (for that the driver contains for each parameter default value, minimal and maximal value) and initializes the adapter according to this configuration. The goal of validity check is preventing of unexpected behavior of the driver during NDIS tests, where the test procedure writes registry with random and invalid values and then starts the adapter (the driver can fail initialization, but can not cause crashes or stop responding). See “List of configurable parameters” below for complete list.

Initialization of the driver and device instance

Driver initialization always starts from system-dependent implementation of DriverEntry procedure. In it the driver registers set of callback procedures to implement per-adapter operation

  • initialization
  • packets sending
  • returning of received packets
  • OID support
  • pausing and resuming (for NDIS6)
  • restart
  • halting

This initialization happens only once. For each PCI device created by QEMU, the registered initialization procedure is called to allocate and prepare per-device context in the driver. There is no dependency between different devices, supported by the same driver; they do not have any common resources. Upon adapter initialization, the system-dependent procedure

  • Allocates storage for device context
  • Does minimal registration of the device instance (just to allow further access to configuration interface and hardware resources)
  • Retrieves configurable parameters
  • Uses allocated hardware resources (IO ports) to access the device and retrieve its features (host features) and MAC address, configured by QEMU command-line (can be overwritten by NDIS)
  • Initializes the device context according to retrieved configuration and device features
  • Allocates and initializes VirtIO library objects
  • Allocates and initializes pre-allocated descriptors, data blocks and binds necessary system-dependent objects to them (according to configurable parameters and device capabilities and features)
  • Enables necessary features in the device (guest features)
  • Enables the interrupt streaming from the device (using interface with device)

Some mechanisms required by driver during initialization have different implementation for NDIS5 and NDIS6:

  • Virtual memory allocation
  • Obtaining configuration handle to retrieve configurable parameters
  • Physical memory allocation and deallocation
  • Allocation and freeing of NDIS object to indicate received data
  • Configuring of interrupt handler and handler of DPC procedure
  • Registering of DMA capabilities (to be able receiving hardware addresses of transmitted buffers)

Initialization process combines system-independent and system-dependent code in order to minimize code duplication and use system-dependent code only when required. NDIS6 driver has advanced initialization process, including report of initial settings (under NDIS5 the NDIS needs to query many driver parameters via OID requests immediately after initialization).

Initialization and clean up of VIRTIO object

Upon initialization, the driver creates two queue objects in VirtIO library – one for RX, one for TX. The VirtIO library retrieves available number of blocks from the device and allocates storage for required number of blocks. These two objects used for all the functional operation after startup. TODO: VirtIO library in its internal code uses non-NDIS calls for allocation of physical memory and translation between physical and virtual addresses. In general, these calls are illegal, but currently the WHQL procedure that verifies calls made from NDIS driver, does not produce error on that. Upon device shutdown (disable or remove) operation, the driver deletes these objects using VirtIO library call. Special case is driver support for power management events (see in “Power management”).

Guest-Host negotiable parameters

Host features do not receive any acknowledge from guest when they do not affect driver-to-device interface. Support for mergeable buffer does, so it is activated only when the driver enables the same feature in guest features set (using PCI part of VirtIO library interface).

Feature Mask in HOST features Mask in GUEST features
Checksum calculation by device VIRTIO_NET_F_CSUM Not required
LSO support by device VIRTIO_NET_F_GUEST_TSO4 Not required
Mergeable RX buffers VIRTIO_NET_F_MRG_RXBUF VIRTIO_NET_F_MRG_RXBUF
Generation of connect / disconnect events VIRTIO_NET_F_STATUS No required