VhostNet

From KVM
Revision as of 13:03, 9 February 2010 by Mst (talk | contribs)

vhost-net: a kernel-level virtio-net server


What is vhost-net

vhost is a kernel-level backend for virtio. The main motivation for vhost is to reduce virtualization overhead for virtio by removing system calls on data path, without guest changes. For virtio-net, this removes up to 4 system calls per packet: vm exit for kick, reentry for kick, iothread wakeup for packet, interrupt injection for packet.

vhost is as minimal as possible. It relies on userspace for all setup work.

Status

vhost is fully functional, and it already shows improvement over userspace virtio.


How to use

Download, build an install kernel from:

kernel:

git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost

userspace:

git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu-kvm.git vhost

Usage instructions:

vhost currently requires MSI-X support in guest virtio. This means guests kernel version should be >= 2.6.31.

To enable vhost, simply add ",vhost" flag to nic options. Example with tap backend:

qemu-system-x86_64 -m 1G disk-c.qcow2 \
-net nic,model=virtio,netdev=foo \
-netdev tap,id=foo,ifname=msttap0,script=/home/mst/ifup,downscript=no,vhost=on


Older (demo) version usage: Example with tap backend:

qemu-system-x86_64 -m 1G disk-c.qcow2 \
-net tap,ifname=msttap0,script=/home/mst/ifup,downscript=no \
-net nic,model=virtio,vhost

Example with raw socket backend:

ifconfig eth3 promisc
qemu-system-x86_64 -m 1G disk-c.qcow2 \
-net raw,ifname=eth3 \
-net nic,model=virtio,vhost

Note: in raw socket mode, when binding to a physical ethernet device, host to guest communication will only work if your device is connected to a bridge configured to mirror outgoing packets back at the originating link. If you do not know whether this is the case, this most likely means it isn't. Use another box to access the guest, or use tap.


Limitations

  • vhost currently requires MSI-X support in guest virtio. This means guests kernel version should be >= 2.6.31.
  • with raw sockets, host to guest, and guest to guest communication on the same host does not always work. Use bridge+tap if you need that.
  • driver unloading in guest and device hot-unplug are broken, because the relevant code in qemu is stubbed out. Need to implement them.

Performance

Still tuning performance, especially guest to host.

External to system numbers with bridge+tap and 10GE vxge card: qemu with bridge+tap, run with: -cpu host,-rdtscp,+x2apic, host+guest 2.6.33-rc2. mtu 1500.

  • netperf TCP_STREAM, default setup, 100 secs run
 native: 81XX Mb/s
 without vhost-net: 72XX Mb/s
 with vhost-net: 78XX Mb/s
  • TCP_RR, 100 secs run
 native: 48 usec/Trans
 without vhost-net: 395 usec/Trans
 with vhost-net: 86 usec/Trans


Here are some local numbers coutesy of Shirley Ma:

  • netperf TCP_STREAM, default setup, 60 secs run
 guest->host increases from 3XXXMb/s to 5XXXMb/s
 host->guest increases from 3XXXMb/s to 4XXXMb/s
  • TCP_RR, 60 secs run
 guest->host trans/s increases from 2XXX/s to 13XXX/s
 host->guest trans/s increases from 2XXX/s to 13XXX/s

TODOs

vhost-net driver projects

  • profiling would be very helpful, I have not done any yet.
  • merged buffers.
  • scalability tuning: figure out the best threading model to use.

qemu projects

  • migration support
  • level triggered interrupts
  • driver unloading/hotplug
  • general cleanup and upstreaming
  • upstream support for injecting interrupts from kernel, from qemu-kvm.git to qemu.git (this is a vhost dependency, without it vhost can't be upstreamed, or it can, but without real benefit)

virtio projects

  • improve small packet/large buffer performance: support "reposting" buffers, pool for indirect buffers
  • guest kernel 2.6.31 seems to work well. Under certain workloads,

virtio performance has regressed with guest kernels 2.6.32 and up (but still better than userspace). A patch has been posted: http://www.spinics.net/lists/netdev/msg115292.html

projects involing other kernel components and/or networking stack

  • rx mac filtering in tun
  • extend raw sockets to support GSO/checksum offloading, and teach vhost to use that capability [one way to do this: virtio net header support]; will allow working with e.g. macvlan
  • improve locking: e.g. RX/TX poll should not need a lock
  • multicast ICMPs snooping in bridge


long term projects

  • kvm eventfd support for injecting level interrupts
  • multiqueue (involves all of vhost, qemu, virtio, networking stack)
  • zero copy tx for tun/raw sockets

Other

  • More testing is always good

Short term plans for MST

  • get vhost net merged in linux kernel 2.6.33
  • address most vhost qemu TODOs
  • get vhost support merged in upstream qemu

Short term plans for IBM(Sridhar Samudrala, David Stevens, Shirley MA)

  • Add GSO/checksum offload support to AF_PACKET(raw) sockets.
  • Mergeable RX buffers support in vhost-net.
  • Defer SKB allocation in virtio_net receive path.