|
|
(3 intermediate revisions by 2 users not shown) |
Line 1: |
Line 1: |
| vhost-net: a kernel-level virtio-net server
| | #REDIRECT [[UsingVhost]] |
| | |
| | |
| == What is vhost-net ==
| |
| | |
| vhost is a kernel-level backend for virtio.
| |
| The main motivation for vhost is to reduce virtualization
| |
| overhead for virtio by removing system calls on data path,
| |
| without guest changes. For virtio-net, this removes up to
| |
| 4 system calls per packet: vm exit for kick, reentry for kick,
| |
| iothread wakeup for packet, interrupt injection for packet.
| |
| | |
| vhost is as minimal as possible. It relies on userspace for
| |
| all setup work.
| |
| | |
| === Status ===
| |
| vhost is fully functional, and it already shows
| |
| improvement over userspace virtio.
| |
| | |
| | |
| == How to use ==
| |
| | |
| Download, build an install kernel from:
| |
| | |
| kernel:
| |
| git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost
| |
| userspace:
| |
| git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu-kvm.git vhost
| |
| | |
| === Usage instructions: ===
| |
| | |
| vhost currently requires MSI-X support in guest virtio.
| |
| This means guests kernel version should be >= 2.6.31.
| |
| | |
| To enable vhost, simply add ",vhost" flag to nic options.
| |
| Example with tap backend:
| |
| | |
| qemu-system-x86_64 -m 1G disk-c.qcow2 \
| |
| -netdev tap,id=foo,ifname=msttap0,script=/home/mst/ifup,downscript=no,'''vhost=on'''
| |
| | |
| | |
| Older (demo) version usage:
| |
| Example with tap backend:
| |
| | |
| qemu-system-x86_64 -m 1G disk-c.qcow2 \
| |
| -net tap,ifname=msttap0,script=/home/mst/ifup,downscript=no \
| |
| -net nic,model=virtio,'''vhost'''
| |
| | |
| Example with raw socket backend:
| |
| | |
| ifconfig eth3 promisc
| |
| qemu-system-x86_64 -m 1G disk-c.qcow2 \
| |
| -net raw,ifname=eth3 \
| |
| -net nic,model=virtio,'''vhost'''
| |
| | |
| Note: in raw socket mode, when binding to a physical
| |
| ethernet device, host to guest communication
| |
| will only work if your device is connected to a bridge
| |
| configured to mirror outgoing packets back at the originating link.
| |
| If you do not know whether this is the case, this most likely
| |
| means it isn't. Use another box to access the guest, or use tap.
| |
| | |
| | |
| == Limitations ==
| |
| * vhost currently requires MSI-X support in guest virtio. This means guests kernel version should be >= 2.6.31.
| |
| * with raw sockets, host to guest, and guest to guest communication on the same host does not always work. Use bridge+tap if you need that.
| |
| * driver unloading in guest and device hot-unplug are broken, because the relevant code in qemu is stubbed out. Need to implement them.
| |
| | |
| == Performance ==
| |
| | |
| Still tuning performance, especially guest to host.
| |
| | |
| External to system numbers with bridge+tap and 10GE vxge card:
| |
| qemu with bridge+tap, run with: -cpu host,-rdtscp,+x2apic,
| |
| host+guest 2.6.33-rc2. mtu 1500.
| |
| | |
| * netperf TCP_STREAM, default setup, 100 secs run
| |
| native: 81XX Mb/s
| |
| without vhost-net: 72XX Mb/s
| |
| with vhost-net: 78XX Mb/s
| |
| | |
| * TCP_RR, 100 secs run
| |
| native: 48 usec/Trans
| |
| without vhost-net: 395 usec/Trans
| |
| with vhost-net: 86 usec/Trans
| |
| | |
| | |
| Here are some local numbers coutesy of Shirley Ma:
| |
| | |
| * netperf TCP_STREAM, default setup, 60 secs run
| |
| guest->host increases from 3XXXMb/s to 5XXXMb/s
| |
| host->guest increases from 3XXXMb/s to 4XXXMb/s
| |
| | |
| * TCP_RR, 60 secs run
| |
| guest->host trans/s increases from 2XXX/s to 13XXX/s
| |
| host->guest trans/s increases from 2XXX/s to 13XXX/s
| |
| | |
| == TODOs ==
| |
| | |
| === vhost-net driver projects ===
| |
| * profiling would be very helpful, I have not done any yet.
| |
| * merged buffers.
| |
| * scalability tuning: figure out the best threading model to use.
| |
| | |
| === qemu projects ===
| |
| * migration support
| |
| * level triggered interrupts
| |
| * driver unloading/hotplug
| |
| * general cleanup and upstreaming
| |
| * upstream support for injecting interrupts from kernel, from qemu-kvm.git to qemu.git (this is a vhost dependency, without it vhost can't be upstreamed, or it can, but without real benefit)
| |
| | |
| === virtio projects ===
| |
| * improve small packet/large buffer performance: support "reposting" buffers, pool for indirect buffers
| |
| * guest kernel 2.6.31 seems to work well. Under certain workloads,
| |
| virtio performance has regressed with guest kernels 2.6.32 and up
| |
| (but still better than userspace). A patch has been posted:
| |
| http://www.spinics.net/lists/netdev/msg115292.html
| |
| | |
| === projects involing other kernel components and/or networking stack ===
| |
| * rx mac filtering in tun
| |
| * extend raw sockets to support GSO/checksum offloading, and teach vhost to use that capability [one way to do this: virtio net header support]; will allow working with e.g. macvlan
| |
| * improve locking: e.g. RX/TX poll should not need a lock
| |
| * multicast ICMPs snooping in bridge
| |
| | |
| | |
| === long term projects ===
| |
| * kvm eventfd support for injecting level interrupts
| |
| * multiqueue (involves all of vhost, qemu, virtio, networking stack)
| |
| * zero copy tx for tun/raw sockets
| |
| | |
| === Other ===
| |
| * More testing is always good
| |
| | |
| == Short term plans for MST ==
| |
| | |
| * get vhost net merged in linux kernel 2.6.33
| |
| * address most vhost qemu TODOs
| |
| * get vhost support merged in upstream qemu
| |
| | |
| == Short term plans for IBM(Sridhar Samudrala, David Stevens, Shirley MA) ==
| |
| * Add GSO/checksum offload support to AF_PACKET(raw) sockets.
| |
| * Mergeable RX buffers support in vhost-net.
| |
| * Defer SKB allocation in virtio_net receive path.
| |