<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://linux-kvm.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Mst</id>
	<title>KVM - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://linux-kvm.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Mst"/>
	<link rel="alternate" type="text/html" href="https://linux-kvm.org/page/Special:Contributions/Mst"/>
	<updated>2026-06-07T14:55:14Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.39.5</generator>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=158887</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=158887"/>
		<updated>2014-12-04T11:20:53Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome! ===&lt;br /&gt;
&lt;br /&gt;
* virtio 1.0 support for linux guests&lt;br /&gt;
    required for maintainatibility&lt;br /&gt;
    mid.gmane.org/1414081380-14623-1-git-send-email-mst@redhat.com&lt;br /&gt;
    Developer: MST,Cornelia Huck&lt;br /&gt;
&lt;br /&gt;
* virtio 1.0 support in qemu&lt;br /&gt;
    required for maintainatibility&lt;br /&gt;
    mid.gmane.org/20141024103839.7162b93f.cornelia.huck@de.ibm.com&lt;br /&gt;
    Developer: Cornelia Huck, MST&lt;br /&gt;
&lt;br /&gt;
* improve net polling for cpu overcommit&lt;br /&gt;
    exit busy loop when another process is runnable&lt;br /&gt;
    mid.gmane.org/20140822073653.GA7372@gmail.com&lt;br /&gt;
    mid.gmane.org/1408608310-13579-2-git-send-email-jasowang@redhat.com&lt;br /&gt;
    Another idea is make the busy_read/busy_poll dynamic like dynamic PLE window.&lt;br /&gt;
    Developer: Jason Wang, MST&lt;br /&gt;
&lt;br /&gt;
* vhost-net/tun/macvtap cross endian support&lt;br /&gt;
    mid.gmane.org/1414572130-17014-2-git-send-email-clg@fr.ibm.com&lt;br /&gt;
    Developer: Cédric Le Goater, MST&lt;br /&gt;
&lt;br /&gt;
* BQL/aggregation for virtio net&lt;br /&gt;
   dependencies: orphan packets less agressively, enable tx interrupt &lt;br /&gt;
   Developers: MST, Jason&lt;br /&gt;
* orphan packets less agressively (was make pktgen works for virtio-net ( or partially orphan ))&lt;br /&gt;
       virtio-net orphans all skbs during tx, this used to be optimal.&lt;br /&gt;
       Recent changes in guest networking stack and hardware advances&lt;br /&gt;
       such as APICv changed optimal behaviour for drivers.&lt;br /&gt;
       We need to revisit optimizations such as orphaning all packets early&lt;br /&gt;
       to have optimal behaviour.&lt;br /&gt;
&lt;br /&gt;
       this should also fix pktgen which is currently broken with virtio net:&lt;br /&gt;
       orphaning all skbs makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: bring back tx interrupt (partially)&lt;br /&gt;
       Jason&#039;s idea: introduce a flag to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developers: Jason Wang, MST&lt;br /&gt;
&lt;br /&gt;
* enable tx interrupt (conditionally?)&lt;br /&gt;
  Small packet TCP stream performance is not good. This is because virtio-net orphan the packet during ndo_start_xmit() which disable the TCP small packet optimizations like TCP small Queue and AutoCork. The idea is enable the tx interrupt to TCP small packets.&lt;br /&gt;
  Jason&#039;s idea: switch between poll and tx interrupt mode based on recent statistics.&lt;br /&gt;
  MST&#039;s idea: use a per descriptor flag for virtio to force interrupt for a specific packet.&lt;br /&gt;
  Developer: Jason Wang, MST&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-net polling&lt;br /&gt;
      mid.gmane.org/20141029123831.A80F338002D@moren.haifa.ibm.com&lt;br /&gt;
      Developer: Razya Ladelsky&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* support more queues in tun&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     http://mid.gmane.org/1408369040-1216-1-git-send-email-pagupta@redhat.com&lt;br /&gt;
     Developers: Pankaj Gupta, Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Documentation/networking/scaling.txt&lt;br /&gt;
       Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default?&lt;br /&gt;
       depends on: BQL&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
&lt;br /&gt;
* bridge without promisc/allmulti mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Done for unicast, but not for multicast.&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan?&lt;br /&gt;
&lt;br /&gt;
* Enable LRO with bridging&lt;br /&gt;
  Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman?&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: Marcel Apfelbaum&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
  Reduce the number of interrupt&lt;br /&gt;
  Rx interrupt coalescing should be good for rx stream throughput.&lt;br /&gt;
  Tx interrupt coalescing will help the optimization of enabling tx interrupt conditionally.&lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* sharing config interrupts&lt;br /&gt;
  Support more devices by sharing a single msi vector&lt;br /&gt;
  between multiple virtio devices.&lt;br /&gt;
  (Applies to virtio-blk too).&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Multi-queue macvtap with real multiple queues&lt;br /&gt;
        Macvtap only provides multiple queues to user in the form of multiple&lt;br /&gt;
        sockets.  As each socket will perform dev_queue_xmit() and we don&#039;t&lt;br /&gt;
        really have multiple real queues on the device, we now have a lock&lt;br /&gt;
        contention.  This contention needs to be addressed.&lt;br /&gt;
        Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* better xmit queueing for tun&lt;br /&gt;
        when guest is slower than host, tun drops packets&lt;br /&gt;
        aggressively. This is because keeping packets on&lt;br /&gt;
        the internal queue does not work well.&lt;br /&gt;
        re-enable functionality to stop queue,&lt;br /&gt;
        probably with some watchdog to help with buggy guests.&lt;br /&gt;
        Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Dev watchdog for virtio-net:&lt;br /&gt;
        Implement a watchdog for virtio-net. This will be useful for hunting host bugs early.&lt;br /&gt;
        Developer: Julio Faracco &amp;lt;jcfaracco@gmail.com&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== projects in need of an owner ===&lt;br /&gt;
&lt;br /&gt;
* improve netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - rx busy polling for virtio-net [DONE]&lt;br /&gt;
    see https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=91815639d8804d1eee7ce2e1f7f60b36771db2c9. 1 byte netperf TCP_RR shows 127% improvement.&lt;br /&gt;
    Future work is co-operate with host, and only does the busy polling when there&#039;s no other process in host cpu. &lt;br /&gt;
  contact: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* drop vhostforce&lt;br /&gt;
  it&#039;s an optimization, probbaly not worth it anymore&lt;br /&gt;
&lt;br /&gt;
* avoid userspace virtio-net when vhost is enabled.&lt;br /&gt;
  ATM we run in userspace until DRIVER_OK&lt;br /&gt;
  this doubles our security attack surface,&lt;br /&gt;
  so it&#039;s best avoided.&lt;br /&gt;
&lt;br /&gt;
* feature negotiation for dpdk/vhost user&lt;br /&gt;
  feature negotiation seems to be broken&lt;br /&gt;
&lt;br /&gt;
* switch dpdk to qemu vhost user&lt;br /&gt;
  this seems like a better interface than&lt;br /&gt;
   character device in userspace,&lt;br /&gt;
   designed for out of process networking&lt;br /&gt;
&lt;br /&gt;
* netmap - like approach to zero copy networking&lt;br /&gt;
   is anything like this feasible on linux?&lt;br /&gt;
&lt;br /&gt;
* vhost-user: clean up protocol&lt;br /&gt;
  address multiple issues in vhost user protocol:&lt;br /&gt;
   missing VHOST_NET_SET_BACKEND&lt;br /&gt;
   make more messages synchronous (with a reply)&lt;br /&gt;
   VHOST_SET_MEM_TABLE, VHOST_SET_VRING_CALL&lt;br /&gt;
    mid.gmane.org/541956B8.1070203@huawei.com&lt;br /&gt;
    mid.gmane.org/54192136.2010409@huawei.com&lt;br /&gt;
   Contact: MST&lt;br /&gt;
&lt;br /&gt;
* ethtool seftest support for virtio-net&lt;br /&gt;
        Implement selftest ethtool method for virtio-net for regression test e.g the CVEs found for tun/macvtap, qemu and vhost.&lt;br /&gt;
        http://mid.gmane.org/1409881866-14780-1-git-send-email-hjxiaohust@gmail.com&lt;br /&gt;
        Contact: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Contact: Razya Ladelsky, Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* DPDK with vhost-user&lt;br /&gt;
  Support vhost-user in addition to vhost net cuse device&lt;br /&gt;
  Contact: Linhaifeng, MST&lt;br /&gt;
&lt;br /&gt;
* DPDK with vhost-net/user: fix offloads&lt;br /&gt;
  DPDK requires disabling offloads ATM,&lt;br /&gt;
  need to fix this.&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* reduce per-device memory allocations&lt;br /&gt;
  vhost device is very large due to need to&lt;br /&gt;
  keep large arrays of iovecs around.&lt;br /&gt;
  we do need large arrays for correctness,&lt;br /&gt;
  but we could move them out of line,&lt;br /&gt;
  and add short inline arrays for typical use-cases.&lt;br /&gt;
  contact: MST&lt;br /&gt;
&lt;br /&gt;
* batch tx completions in vhost&lt;br /&gt;
  vhost already batches up to 64 tx completions for zero copy&lt;br /&gt;
  batch non zero copy as well&lt;br /&gt;
  contact: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* better parallelize small queues&lt;br /&gt;
  don&#039;t wait for ring full to kick.&lt;br /&gt;
  add api to detect ring almost full (e.g. 3/4) and kick&lt;br /&gt;
  depends on: BQL&lt;br /&gt;
  contact: MST&lt;br /&gt;
&lt;br /&gt;
* improve vhost-user unit test&lt;br /&gt;
  support running on machines without hugetlbfs&lt;br /&gt;
  support running with more vm memory layouts&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* tun: fix RX livelock&lt;br /&gt;
        it&#039;s easy for guest to starve out host networking&lt;br /&gt;
        open way to fix this is to use napi &lt;br /&gt;
        Contact: MST&lt;br /&gt;
&lt;br /&gt;
* large-order allocations&lt;br /&gt;
   see 28d6427109d13b0f447cba5761f88d3548e83605&lt;br /&gt;
   contact: MST&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Contact: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  This project seems abandoned?&lt;br /&gt;
  Contact: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level-triggered interrupts&lt;br /&gt;
  aim: enable vhost by default for level interrupts.&lt;br /&gt;
  The benefit is security: we want to avoid using userspace&lt;br /&gt;
  virtio net so that vhost-net is always used.&lt;br /&gt;
&lt;br /&gt;
  Alex emulated (post &amp;amp; re-enable) level-triggered interrupt in KVM for&lt;br /&gt;
  skipping userspace. VFIO already enjoied the performance benefit,&lt;br /&gt;
  let&#039;s do it for virtio-pci. Current virtio-pci devices still use&lt;br /&gt;
  level-interrupt in userspace.&lt;br /&gt;
  see: kernel:&lt;br /&gt;
  7a84428af [PATCH] KVM: Add resampling irqfds for level triggered interrupts&lt;br /&gt;
 qemu:&lt;br /&gt;
  68919cac [PATCH] hw/vfio: set interrupts using pci irq wrappers&lt;br /&gt;
           (virtio-pci didn&#039;t use the wrappers)&lt;br /&gt;
  e1d1e586 [PATCH] vfio-pci: Add KVM INTx acceleration&lt;br /&gt;
&lt;br /&gt;
  Contact: Amos Kong, MST       &lt;br /&gt;
&lt;br /&gt;
* Head of line blocking issue with zerocopy&lt;br /&gt;
       zerocopy has several defects that will cause head of line blocking problem:&lt;br /&gt;
       - limit the number of pending DMAs&lt;br /&gt;
       - complete in order&lt;br /&gt;
       This means is one of some of the DMAs were delayed, all other will also delayed. This could be reproduced with following case:&lt;br /&gt;
       - boot two VMS VM1(tap1) and VM2(tap2) on host1 (has eth0)&lt;br /&gt;
       - setup tbf to limit the tap2 bandwidth to 10Mbit/s&lt;br /&gt;
       - start two netperf instances one from VM1 to VM2, another from VM1 to an external host whose traffic go through eth0 on host&lt;br /&gt;
       Then you can see not only VM1 to VM2 is throttled, but also VM1 to external host were also throttled.&lt;br /&gt;
       For this issue, a solution is orphan the frags when en queuing to non work conserving qdisc.&lt;br /&gt;
       But we have have similar issues in other case:&lt;br /&gt;
       - The card has its own priority queues&lt;br /&gt;
       - Host has two interface, one is 1G another is 10G, so throttle 1G may lead traffic over 10G to be throttled.&lt;br /&gt;
       The final solution is to remove receive buffering at tun, and convert it to use NAPI&lt;br /&gt;
       Contact: Jason Wang, MST&lt;br /&gt;
       Reference: https://lkml.org/lkml/2014/1/17/105&lt;br /&gt;
&lt;br /&gt;
* network traffic throttling&lt;br /&gt;
  block implemented &amp;quot;continuous leaky bucket&amp;quot; for throttling&lt;br /&gt;
  we can use continuous leaky bucket to network&lt;br /&gt;
  IOPS/BPS * RX/TX/TOTAL&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* Allocate mac_table dynamically&lt;br /&gt;
&lt;br /&gt;
  In the future, maybe we can allocate the mac_table dynamically instead&lt;br /&gt;
  of embed it in VirtIONet. Then we can just does a pointer swap and&lt;br /&gt;
  gfree() and can save a memcpy() here.&lt;br /&gt;
  Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
    Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
        Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* add documentation for macvlan and macvtap&lt;br /&gt;
   recent docs here:&lt;br /&gt;
   http://backreference.org/2014/03/20/some-notes-on-macvlanmacvtap/&lt;br /&gt;
   need to integrate in iproute and kernel docs.&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
  Search for &amp;quot;Xin Xiaohui: Provide a zero-copy method on KVM virtio-net&amp;quot;&lt;br /&gt;
  for a very old prototype&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
* non-virtio device support with vhost&lt;br /&gt;
  Use vhost interface for guests that don&#039;t use virtio-net&lt;br /&gt;
&lt;br /&gt;
* Extend sndbuf scope to int64&lt;br /&gt;
  Current sndbuf limit is INT_MAX in tap_set_sndbuf(),&lt;br /&gt;
  large values (like 8388607T) can be converted rightly by qapi from qemu commandline,&lt;br /&gt;
  If we want to support the large values, we should extend sndbuf limit from &#039;int&#039; to &#039;int64&#039;&lt;br /&gt;
  Why is this useful?&lt;br /&gt;
  Upstream discussion: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg04192.html&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear ===&lt;br /&gt;
&lt;br /&gt;
* change tcp_tso_should_defer for kvm: batch more&lt;br /&gt;
  aggressively.&lt;br /&gt;
  in particular, see below&lt;br /&gt;
&lt;br /&gt;
* tcp: increase gso buffering for cubic,reno&lt;br /&gt;
    At the moment we push out an skb whenever the limit becomes&lt;br /&gt;
    large enough to send a full-sized TSO skb even if the skb,&lt;br /&gt;
    in fact, is not full-sized.&lt;br /&gt;
    The reason for this seems to be that some congestion avoidance&lt;br /&gt;
    protocols rely on the number of packets in flight to calculate&lt;br /&gt;
    CWND, so if we underuse the available CWND it shrinks&lt;br /&gt;
    which degrades performance:&lt;br /&gt;
    http://www.mail-archive.com/netdev@vger.kernel.org/msg08738.html&lt;br /&gt;
&lt;br /&gt;
    However, there seems to be no reason to do this for&lt;br /&gt;
    protocols such as reno and cubic which don&#039;t rely on packets in flight,&lt;br /&gt;
    and so will simply increase CWND a bit more to compensate for the&lt;br /&gt;
    underuse.&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        kernel part is done (Vlad Yasevich)&lt;br /&gt;
        teach qemu to notify libvirt to enable the filter (still to do) (existed NIC_RX_FILTER_CHANGED event contains vlan-tables)&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* virtio: preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
* vxlan&lt;br /&gt;
  What could we do here?&lt;br /&gt;
&lt;br /&gt;
* bridging without promisc mode with OVS&lt;br /&gt;
&lt;br /&gt;
=== high level issues: not clear what the project is, yet ===&lt;br /&gt;
&lt;br /&gt;
* security: iptables&lt;br /&gt;
At the moment most people disables iptables to get&lt;br /&gt;
good performance on 10G/s networking.&lt;br /&gt;
Any way to improve experience?&lt;br /&gt;
&lt;br /&gt;
* performance&lt;br /&gt;
Going through scheduler and full networking stack twice&lt;br /&gt;
(host+guest) adds a lot of overhead&lt;br /&gt;
Any way to allow bypassing some layers?&lt;br /&gt;
&lt;br /&gt;
* manageability&lt;br /&gt;
Still hard to figure out VM networking,&lt;br /&gt;
VM networking is through libvirt, host networking through NM&lt;br /&gt;
Any way to integrate?&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Write some unit tests for vhost-net/vhost-scsi&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
* Migrate some of the performance regression autotest functionality into Netperf&lt;br /&gt;
  - Get the CPU-utilization of the Host and the other-party, and add them to the report. This is also true for other Host measures, such as vmexits, interrupts, ...&lt;br /&gt;
  - Run Netperf in demo-mode, and measure only the time when all the sessions are active (could be many seconds after the beginning of the tests)&lt;br /&gt;
  - Packaging of Netperf in Fedora / RHEL (exists in Fedora). Licensing could be an issue.&lt;br /&gt;
  - Make the scripts more visible&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=119867</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=119867"/>
		<updated>2014-11-13T15:57:01Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome! ===&lt;br /&gt;
&lt;br /&gt;
* virtio 1.0 support for linux guests&lt;br /&gt;
    required for maintainatibility&lt;br /&gt;
    mid.gmane.org/1414081380-14623-1-git-send-email-mst@redhat.com&lt;br /&gt;
    Developer: MST,Cornelia Huck&lt;br /&gt;
&lt;br /&gt;
* virtio 1.0 support in qemu&lt;br /&gt;
    required for maintainatibility&lt;br /&gt;
    mid.gmane.org/20141024103839.7162b93f.cornelia.huck@de.ibm.com&lt;br /&gt;
    Developer: Cornelia Huck, MST&lt;br /&gt;
&lt;br /&gt;
* improve net polling for cpu overcommit&lt;br /&gt;
    exit busy loop when another process is runnable&lt;br /&gt;
    mid.gmane.org/20140822073653.GA7372@gmail.com&lt;br /&gt;
    mid.gmane.org/1408608310-13579-2-git-send-email-jasowang@redhat.com&lt;br /&gt;
    Developer: Jason Wang, MST&lt;br /&gt;
&lt;br /&gt;
* vhost-net/tun/macvtap cross endian support&lt;br /&gt;
    mid.gmane.org/1414572130-17014-2-git-send-email-clg@fr.ibm.com&lt;br /&gt;
    Developer: Cédric Le Goater, MST&lt;br /&gt;
&lt;br /&gt;
* BQL/aggregation for virtio net&lt;br /&gt;
   dependencies: orphan packets less agressively, enable tx interrupt &lt;br /&gt;
   Developers: MST, Jason&lt;br /&gt;
* orphan packets less agressively (was make pktgen works for virtio-net ( or partially orphan ))&lt;br /&gt;
       virtio-net orphans all skbs during tx, this used to be optimal.&lt;br /&gt;
       Recent changes in guest networking stack and hardware advances&lt;br /&gt;
       such as APICv changed optimal behaviour for drivers.&lt;br /&gt;
       We need to revisit optimizations such as orphaning all packets early&lt;br /&gt;
       to have optimal behaviour.&lt;br /&gt;
&lt;br /&gt;
       this should also fix pktgen which is currently broken with virtio net:&lt;br /&gt;
       orphaning all skbs makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: bring back tx interrupt (partially)&lt;br /&gt;
       Jason&#039;s idea: introduce a flag to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developers: Jason Wang, MST&lt;br /&gt;
&lt;br /&gt;
* enable tx interrupt (conditionally?)&lt;br /&gt;
  Small packet TCP stream performance is not good. This is because virtio-net orphan the packet during ndo_start_xmit() which disable the TCP small packet optimizations like TCP small Queue and AutoCork. The idea is enable the tx interrupt to TCP small packets.&lt;br /&gt;
  Jason&#039;s idea: switch between poll and tx interrupt mode based on recent statistics.&lt;br /&gt;
  MST&#039;s idea: use a per descriptor flag for virtio to force interrupt for a specific packet.&lt;br /&gt;
  Developer: Jason Wang, MST&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-net polling&lt;br /&gt;
      mid.gmane.org/20141029123831.A80F338002D@moren.haifa.ibm.com&lt;br /&gt;
      Developer: Razya Ladelsky&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* support more queues in tun&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     http://mid.gmane.org/1408369040-1216-1-git-send-email-pagupta@redhat.com&lt;br /&gt;
     Developers: Pankaj Gupta, Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Documentation/networking/scaling.txt&lt;br /&gt;
       Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default?&lt;br /&gt;
       depends on: BQL&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* ethtool seftest support for virtio-net&lt;br /&gt;
        Implement selftest ethtool method for virtio-net for regression test e.g the CVEs found for tun/macvtap, qemu and vhost.&lt;br /&gt;
        http://mid.gmane.org/1409881866-14780-1-git-send-email-hjxiaohust@gmail.com&lt;br /&gt;
        Developers: Hengjinxiao,Jason Wang&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc/allmulti mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Done for unicast, but not for multicast.&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan?&lt;br /&gt;
&lt;br /&gt;
* Enable LRO with bridging&lt;br /&gt;
  Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman?&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: Marcel Apfelbaum&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
  Reduce the number of interrupt&lt;br /&gt;
  Rx interrupt coalescing should be good for rx stream throughput.&lt;br /&gt;
  Tx interrupt coalescing will help the optimization of enabling tx interrupt conditionally.&lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* sharing config interrupts&lt;br /&gt;
  Support more devices by sharing a single msi vector&lt;br /&gt;
  between multiple virtio devices.&lt;br /&gt;
  (Applies to virtio-blk too).&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Multi-queue macvtap with real multiple queues&lt;br /&gt;
        Macvtap only provides multiple queues to user in the form of multiple&lt;br /&gt;
        sockets.  As each socket will perform dev_queue_xmit() and we don&#039;t&lt;br /&gt;
        really have multiple real queues on the device, we now have a lock&lt;br /&gt;
        contention.  This contention needs to be addressed.&lt;br /&gt;
        Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* better xmit queueing for tun&lt;br /&gt;
        when guest is slower than host, tun drops packets&lt;br /&gt;
        aggressively. This is because keeping packets on&lt;br /&gt;
        the internal queue does not work well.&lt;br /&gt;
        re-enable functionality to stop queue,&lt;br /&gt;
        probably with some watchdog to help with buggy guests.&lt;br /&gt;
        Developer: MST&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== projects in need of an owner ===&lt;br /&gt;
&lt;br /&gt;
* improve netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - rx busy polling for virtio-net [DONE]&lt;br /&gt;
    see https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=91815639d8804d1eee7ce2e1f7f60b36771db2c9. 1 byte netperf TCP_RR shows 127% improvement.&lt;br /&gt;
    Future work is co-operate with host, and only does the busy polling when there&#039;s no other process in host cpu. &lt;br /&gt;
  contact: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* drop vhostforce&lt;br /&gt;
  it&#039;s an optimization, probbaly not worth it anymore&lt;br /&gt;
&lt;br /&gt;
* feature negotiation for dpdk/vhost user&lt;br /&gt;
  feature negotiation seems to be broken&lt;br /&gt;
&lt;br /&gt;
* switch dpdk to qemu vhost user&lt;br /&gt;
  this seems like a better interface than&lt;br /&gt;
   character device in userspace,&lt;br /&gt;
   designed for out of process networking&lt;br /&gt;
&lt;br /&gt;
* netmap - like approach to zero copy networking&lt;br /&gt;
   is anything like this feasible on linux?&lt;br /&gt;
&lt;br /&gt;
* vhost-user: clean up protocol&lt;br /&gt;
  address multiple issues in vhost user protocol:&lt;br /&gt;
   missing VHOST_NET_SET_BACKEND&lt;br /&gt;
   make more messages synchronous (with a reply)&lt;br /&gt;
   VHOST_SET_MEM_TABLE, VHOST_SET_VRING_CALL&lt;br /&gt;
    mid.gmane.org/541956B8.1070203@huawei.com&lt;br /&gt;
    mid.gmane.org/54192136.2010409@huawei.com&lt;br /&gt;
   Contact: MST&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Dev watchdog for virtio-net:&lt;br /&gt;
        Implement a watchdog for virtio-net. This will be useful for hunting host bugs early.&lt;br /&gt;
        Contact: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Contact: Razya Ladelsky, Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* DPDK with vhost-user&lt;br /&gt;
  Support vhost-user in addition to vhost net cuse device&lt;br /&gt;
  Contact: Linhaifeng, MST&lt;br /&gt;
&lt;br /&gt;
* DPDK with vhost-net/user: fix offloads&lt;br /&gt;
  DPDK requires disabling offloads ATM,&lt;br /&gt;
  need to fix this.&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* reduce per-device memory allocations&lt;br /&gt;
  vhost device is very large due to need to&lt;br /&gt;
  keep large arrays of iovecs around.&lt;br /&gt;
  we do need large arrays for correctness,&lt;br /&gt;
  but we could move them out of line,&lt;br /&gt;
  and add short inline arrays for typical use-cases.&lt;br /&gt;
  contact: MST&lt;br /&gt;
&lt;br /&gt;
* batch tx completions in vhost&lt;br /&gt;
  vhost already batches up to 64 tx completions for zero copy&lt;br /&gt;
  batch non zero copy as well&lt;br /&gt;
  contact: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* better parallelize small queues&lt;br /&gt;
  don&#039;t wait for ring full to kick.&lt;br /&gt;
  add api to detect ring almost full (e.g. 3/4) and kick&lt;br /&gt;
  depends on: BQL&lt;br /&gt;
  contact: MST&lt;br /&gt;
&lt;br /&gt;
* improve vhost-user unit test&lt;br /&gt;
  support running on machines without hugetlbfs&lt;br /&gt;
  support running with more vm memory layouts&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* tun: fix RX livelock&lt;br /&gt;
        it&#039;s easy for guest to starve out host networking&lt;br /&gt;
        open way to fix this is to use napi &lt;br /&gt;
        Contact: MST&lt;br /&gt;
&lt;br /&gt;
* large-order allocations&lt;br /&gt;
   see 28d6427109d13b0f447cba5761f88d3548e83605&lt;br /&gt;
   contact: MST&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Contact: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  This project seems abandoned?&lt;br /&gt;
  Contact: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level-triggered interrupts&lt;br /&gt;
  aim: enable vhost by default for level interrupts.&lt;br /&gt;
  The benefit is security: we want to avoid using userspace&lt;br /&gt;
  virtio net so that vhost-net is always used.&lt;br /&gt;
&lt;br /&gt;
  Alex emulated (post &amp;amp; re-enable) level-triggered interrupt in KVM for&lt;br /&gt;
  skipping userspace. VFIO already enjoied the performance benefit,&lt;br /&gt;
  let&#039;s do it for virtio-pci. Current virtio-pci devices still use&lt;br /&gt;
  level-interrupt in userspace.&lt;br /&gt;
  see: kernel:&lt;br /&gt;
  7a84428af [PATCH] KVM: Add resampling irqfds for level triggered interrupts&lt;br /&gt;
 qemu:&lt;br /&gt;
  68919cac [PATCH] hw/vfio: set interrupts using pci irq wrappers&lt;br /&gt;
           (virtio-pci didn&#039;t use the wrappers)&lt;br /&gt;
  e1d1e586 [PATCH] vfio-pci: Add KVM INTx acceleration&lt;br /&gt;
&lt;br /&gt;
  Contact: Amos Kong, MST       &lt;br /&gt;
&lt;br /&gt;
* Head of line blocking issue with zerocopy&lt;br /&gt;
       zerocopy has several defects that will cause head of line blocking problem:&lt;br /&gt;
       - limit the number of pending DMAs&lt;br /&gt;
       - complete in order&lt;br /&gt;
       This means is one of some of the DMAs were delayed, all other will also delayed. This could be reproduced with following case:&lt;br /&gt;
       - boot two VMS VM1(tap1) and VM2(tap2) on host1 (has eth0)&lt;br /&gt;
       - setup tbf to limit the tap2 bandwidth to 10Mbit/s&lt;br /&gt;
       - start two netperf instances one from VM1 to VM2, another from VM1 to an external host whose traffic go through eth0 on host&lt;br /&gt;
       Then you can see not only VM1 to VM2 is throttled, but also VM1 to external host were also throttled.&lt;br /&gt;
       For this issue, a solution is orphan the frags when en queuing to non work conserving qdisc.&lt;br /&gt;
       But we have have similar issues in other case:&lt;br /&gt;
       - The card has its own priority queues&lt;br /&gt;
       - Host has two interface, one is 1G another is 10G, so throttle 1G may lead traffic over 10G to be throttled.&lt;br /&gt;
       The final solution is to remove receive buffering at tun, and convert it to use NAPI&lt;br /&gt;
       Contact: Jason Wang, MST&lt;br /&gt;
       Reference: https://lkml.org/lkml/2014/1/17/105&lt;br /&gt;
&lt;br /&gt;
* network traffic throttling&lt;br /&gt;
  block implemented &amp;quot;continuous leaky bucket&amp;quot; for throttling&lt;br /&gt;
  we can use continuous leaky bucket to network&lt;br /&gt;
  IOPS/BPS * RX/TX/TOTAL&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* Allocate mac_table dynamically&lt;br /&gt;
&lt;br /&gt;
  In the future, maybe we can allocate the mac_table dynamically instead&lt;br /&gt;
  of embed it in VirtIONet. Then we can just does a pointer swap and&lt;br /&gt;
  gfree() and can save a memcpy() here.&lt;br /&gt;
  Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
    Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
        Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* add documentation for macvlan and macvtap&lt;br /&gt;
   recent docs here:&lt;br /&gt;
   http://backreference.org/2014/03/20/some-notes-on-macvlanmacvtap/&lt;br /&gt;
   need to integrate in iproute and kernel docs.&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
  Search for &amp;quot;Xin Xiaohui: Provide a zero-copy method on KVM virtio-net&amp;quot;&lt;br /&gt;
  for a very old prototype&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
* non-virtio device support with vhost&lt;br /&gt;
  Use vhost interface for guests that don&#039;t use virtio-net&lt;br /&gt;
&lt;br /&gt;
* Extend sndbuf scope to int64&lt;br /&gt;
  Current sndbuf limit is INT_MAX in tap_set_sndbuf(),&lt;br /&gt;
  large values (like 8388607T) can be converted rightly by qapi from qemu commandline,&lt;br /&gt;
  If we want to support the large values, we should extend sndbuf limit from &#039;int&#039; to &#039;int64&#039;&lt;br /&gt;
  Why is this useful?&lt;br /&gt;
  Upstream discussion: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg04192.html&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear ===&lt;br /&gt;
&lt;br /&gt;
* change tcp_tso_should_defer for kvm: batch more&lt;br /&gt;
  aggressively.&lt;br /&gt;
  in particular, see below&lt;br /&gt;
&lt;br /&gt;
* tcp: increase gso buffering for cubic,reno&lt;br /&gt;
    At the moment we push out an skb whenever the limit becomes&lt;br /&gt;
    large enough to send a full-sized TSO skb even if the skb,&lt;br /&gt;
    in fact, is not full-sized.&lt;br /&gt;
    The reason for this seems to be that some congestion avoidance&lt;br /&gt;
    protocols rely on the number of packets in flight to calculate&lt;br /&gt;
    CWND, so if we underuse the available CWND it shrinks&lt;br /&gt;
    which degrades performance:&lt;br /&gt;
    http://www.mail-archive.com/netdev@vger.kernel.org/msg08738.html&lt;br /&gt;
&lt;br /&gt;
    However, there seems to be no reason to do this for&lt;br /&gt;
    protocols such as reno and cubic which don&#039;t rely on packets in flight,&lt;br /&gt;
    and so will simply increase CWND a bit more to compensate for the&lt;br /&gt;
    underuse.&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        kernel part is done (Vlad Yasevich)&lt;br /&gt;
        teach qemu to notify libvirt to enable the filter (still to do) (existed NIC_RX_FILTER_CHANGED event contains vlan-tables)&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* virtio: preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
* vxlan&lt;br /&gt;
  What could we do here?&lt;br /&gt;
&lt;br /&gt;
* bridging without promisc mode with OVS&lt;br /&gt;
&lt;br /&gt;
=== high level issues: not clear what the project is, yet ===&lt;br /&gt;
&lt;br /&gt;
* security: iptables&lt;br /&gt;
At the moment most people disables iptables to get&lt;br /&gt;
good performance on 10G/s networking.&lt;br /&gt;
Any way to improve experience?&lt;br /&gt;
&lt;br /&gt;
* performance&lt;br /&gt;
Going through scheduler and full networking stack twice&lt;br /&gt;
(host+guest) adds a lot of overhead&lt;br /&gt;
Any way to allow bypassing some layers?&lt;br /&gt;
&lt;br /&gt;
* manageability&lt;br /&gt;
Still hard to figure out VM networking,&lt;br /&gt;
VM networking is through libvirt, host networking through NM&lt;br /&gt;
Any way to integrate?&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Write some unit tests for vhost-net/vhost-scsi&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
* Migrate some of the performance regression autotest functionality into Netperf&lt;br /&gt;
  - Get the CPU-utilization of the Host and the other-party, and add them to the report. This is also true for other Host measures, such as vmexits, interrupts, ...&lt;br /&gt;
  - Run Netperf in demo-mode, and measure only the time when all the sessions are active (could be many seconds after the beginning of the tests)&lt;br /&gt;
  - Packaging of Netperf in Fedora / RHEL (exists in Fedora). Licensing could be an issue.&lt;br /&gt;
  - Make the scripts more visible&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=118505</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=118505"/>
		<updated>2014-11-10T11:37:09Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome! ===&lt;br /&gt;
&lt;br /&gt;
* virtio 1.0 support for linux guests&lt;br /&gt;
    required for maintainatibility&lt;br /&gt;
    mid.gmane.org/1414081380-14623-1-git-send-email-mst@redhat.com&lt;br /&gt;
    Developer: MST,Cornelia Huck&lt;br /&gt;
&lt;br /&gt;
* virtio 1.0 support in qemu&lt;br /&gt;
    required for maintainatibility&lt;br /&gt;
    mid.gmane.org/20141024103839.7162b93f.cornelia.huck@de.ibm.com&lt;br /&gt;
    Developer: Cornelia Huck, MST&lt;br /&gt;
&lt;br /&gt;
* improve net polling for cpu overcommit&lt;br /&gt;
    exit busy loop when another process is runnable&lt;br /&gt;
    mid.gmane.org/20140822073653.GA7372@gmail.com&lt;br /&gt;
    mid.gmane.org/1408608310-13579-2-git-send-email-jasowang@redhat.com&lt;br /&gt;
    Developer: Jason Wang, MST&lt;br /&gt;
&lt;br /&gt;
* vhost-net/tun/macvtap cross endian support&lt;br /&gt;
    mid.gmane.org/1414572130-17014-2-git-send-email-clg@fr.ibm.com&lt;br /&gt;
    Developer: Cédric Le Goater, MST&lt;br /&gt;
&lt;br /&gt;
* BQL/aggregation for virtio net&lt;br /&gt;
   dependencies: orphan packets less agressively, enable tx interrupt &lt;br /&gt;
   Developers: MST, Jason&lt;br /&gt;
* orphan packets less agressively (was make pktgen works for virtio-net ( or partially orphan ))&lt;br /&gt;
       virtio-net orphans all skbs during tx, this used to be optimal.&lt;br /&gt;
       Recent changes in guest networking stack and hardware advances&lt;br /&gt;
       such as APICv changed optimal behaviour for drivers.&lt;br /&gt;
       We need to revisit optimizations such as orphaning all packets early&lt;br /&gt;
       to have optimal behaviour.&lt;br /&gt;
&lt;br /&gt;
       this should also fix pktgen which is currently broken with virtio net:&lt;br /&gt;
       orphaning all skbs makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: bring back tx interrupt (partially)&lt;br /&gt;
       Jason&#039;s idea: introduce a flag to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developers: Jason Wang, MST&lt;br /&gt;
&lt;br /&gt;
* enable tx interrupt (conditionally?)&lt;br /&gt;
  Small packet TCP stream performance is not good. This is because virtio-net orphan the packet during ndo_start_xmit() which disable the TCP small packet optimizations like TCP small Queue and AutoCork. The idea is enable the tx interrupt to TCP small packets.&lt;br /&gt;
  Jason&#039;s idea: switch between poll and tx interrupt mode based on recent statistics.&lt;br /&gt;
  MST&#039;s idea: use a per descriptor flag for virtio to force interrupt for a specific packet.&lt;br /&gt;
  Developer: Jason Wang, MST&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-net polling&lt;br /&gt;
      mid.gmane.org/20141029123831.A80F338002D@moren.haifa.ibm.com&lt;br /&gt;
      Developer: Razya Ladelsky&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* support more queues in tun&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     http://mid.gmane.org/1408369040-1216-1-git-send-email-pagupta@redhat.com&lt;br /&gt;
     Developers: Pankaj Gupta, Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Documentation/networking/scaling.txt&lt;br /&gt;
       Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default?&lt;br /&gt;
       depends on: BQL&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* ethtool seftest support for virtio-net&lt;br /&gt;
        Implement selftest ethtool method for virtio-net for regression test e.g the CVEs found for tun/macvtap, qemu and vhost.&lt;br /&gt;
        http://mid.gmane.org/1409881866-14780-1-git-send-email-hjxiaohust@gmail.com&lt;br /&gt;
        Developers: Hengjinxiao,Jason Wang&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc/allmulti mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Done for unicast, but not for multicast.&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan?&lt;br /&gt;
&lt;br /&gt;
* Enable LRO with bridging&lt;br /&gt;
  Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman?&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: Marcel Apfelbaum&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
  Reduce the number of interrupt&lt;br /&gt;
  Rx interrupt coalescing should be good for rx stream throughput.&lt;br /&gt;
  Tx interrupt coalescing will help the optimization of enabling tx interrupt conditionally.&lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* sharing config interrupts&lt;br /&gt;
  Support more devices by sharing a single msi vector&lt;br /&gt;
  between multiple virtio devices.&lt;br /&gt;
  (Applies to virtio-blk too).&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Multi-queue macvtap with real multiple queues&lt;br /&gt;
        Macvtap only provides multiple queues to user in the form of multiple&lt;br /&gt;
        sockets.  As each socket will perform dev_queue_xmit() and we don&#039;t&lt;br /&gt;
        really have multiple real queues on the device, we now have a lock&lt;br /&gt;
        contention.  This contention needs to be addressed.&lt;br /&gt;
        Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* better xmit queueing for tun&lt;br /&gt;
        when guest is slower than host, tun drops packets&lt;br /&gt;
        aggressively. This is because keeping packets on&lt;br /&gt;
        the internal queue does not work well.&lt;br /&gt;
        re-enable functionality to stop queue,&lt;br /&gt;
        probably with some watchdog to help with buggy guests.&lt;br /&gt;
        Developer: MST&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== projects in need of an owner ===&lt;br /&gt;
&lt;br /&gt;
* improve netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - rx busy polling for virtio-net [DONE]&lt;br /&gt;
    see https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=91815639d8804d1eee7ce2e1f7f60b36771db2c9. 1 byte netperf TCP_RR shows 127% improvement.&lt;br /&gt;
    Future work is co-operate with host, and only does the busy polling when there&#039;s no other process in host cpu. &lt;br /&gt;
  contact: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* drop vhostforce&lt;br /&gt;
  it&#039;s an optimization, probbaly not worth it anymore&lt;br /&gt;
&lt;br /&gt;
* feature negotiation for dpdk/vhost user&lt;br /&gt;
  feature negotiation seems to be broken&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-user: clean up protocol&lt;br /&gt;
  address multiple issues in vhost user protocol:&lt;br /&gt;
   missing VHOST_NET_SET_BACKEND&lt;br /&gt;
   make more messages synchronous (with a reply)&lt;br /&gt;
   VHOST_SET_MEM_TABLE, VHOST_SET_VRING_CALL&lt;br /&gt;
    mid.gmane.org/541956B8.1070203@huawei.com&lt;br /&gt;
    mid.gmane.org/54192136.2010409@huawei.com&lt;br /&gt;
   Contact: MST&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Dev watchdog for virtio-net:&lt;br /&gt;
        Implement a watchdog for virtio-net. This will be useful for hunting host bugs early.&lt;br /&gt;
        Contact: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Contact: Razya Ladelsky, Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* DPDK with vhost-user&lt;br /&gt;
  Support vhost-user in addition to vhost net cuse device&lt;br /&gt;
  Contact: Linhaifeng, MST&lt;br /&gt;
&lt;br /&gt;
* DPDK with vhost-net/user: fix offloads&lt;br /&gt;
  DPDK requires disabling offloads ATM,&lt;br /&gt;
  need to fix this.&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* reduce per-device memory allocations&lt;br /&gt;
  vhost device is very large due to need to&lt;br /&gt;
  keep large arrays of iovecs around.&lt;br /&gt;
  we do need large arrays for correctness,&lt;br /&gt;
  but we could move them out of line,&lt;br /&gt;
  and add short inline arrays for typical use-cases.&lt;br /&gt;
  contact: MST&lt;br /&gt;
&lt;br /&gt;
* batch tx completions in vhost&lt;br /&gt;
  vhost already batches up to 64 tx completions for zero copy&lt;br /&gt;
  batch non zero copy as well&lt;br /&gt;
  contact: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* better parallelize small queues&lt;br /&gt;
  don&#039;t wait for ring full to kick.&lt;br /&gt;
  add api to detect ring almost full (e.g. 3/4) and kick&lt;br /&gt;
  depends on: BQL&lt;br /&gt;
  contact: MST&lt;br /&gt;
&lt;br /&gt;
* improve vhost-user unit test&lt;br /&gt;
  support running on machines without hugetlbfs&lt;br /&gt;
  support running with more vm memory layouts&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* tun: fix RX livelock&lt;br /&gt;
        it&#039;s easy for guest to starve out host networking&lt;br /&gt;
        open way to fix this is to use napi &lt;br /&gt;
        Contact: MST&lt;br /&gt;
&lt;br /&gt;
* large-order allocations&lt;br /&gt;
   see 28d6427109d13b0f447cba5761f88d3548e83605&lt;br /&gt;
   contact: MST&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Contact: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  This project seems abandoned?&lt;br /&gt;
  Contact: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level-triggered interrupts&lt;br /&gt;
  aim: enable vhost by default for level interrupts.&lt;br /&gt;
  The benefit is security: we want to avoid using userspace&lt;br /&gt;
  virtio net so that vhost-net is always used.&lt;br /&gt;
&lt;br /&gt;
  Alex emulated (post &amp;amp; re-enable) level-triggered interrupt in KVM for&lt;br /&gt;
  skipping userspace. VFIO already enjoied the performance benefit,&lt;br /&gt;
  let&#039;s do it for virtio-pci. Current virtio-pci devices still use&lt;br /&gt;
  level-interrupt in userspace.&lt;br /&gt;
  see: kernel:&lt;br /&gt;
  7a84428af [PATCH] KVM: Add resampling irqfds for level triggered interrupts&lt;br /&gt;
 qemu:&lt;br /&gt;
  68919cac [PATCH] hw/vfio: set interrupts using pci irq wrappers&lt;br /&gt;
           (virtio-pci didn&#039;t use the wrappers)&lt;br /&gt;
  e1d1e586 [PATCH] vfio-pci: Add KVM INTx acceleration&lt;br /&gt;
&lt;br /&gt;
  Contact: Amos Kong, MST       &lt;br /&gt;
&lt;br /&gt;
* Head of line blocking issue with zerocopy&lt;br /&gt;
       zerocopy has several defects that will cause head of line blocking problem:&lt;br /&gt;
       - limit the number of pending DMAs&lt;br /&gt;
       - complete in order&lt;br /&gt;
       This means is one of some of the DMAs were delayed, all other will also delayed. This could be reproduced with following case:&lt;br /&gt;
       - boot two VMS VM1(tap1) and VM2(tap2) on host1 (has eth0)&lt;br /&gt;
       - setup tbf to limit the tap2 bandwidth to 10Mbit/s&lt;br /&gt;
       - start two netperf instances one from VM1 to VM2, another from VM1 to an external host whose traffic go through eth0 on host&lt;br /&gt;
       Then you can see not only VM1 to VM2 is throttled, but also VM1 to external host were also throttled.&lt;br /&gt;
       For this issue, a solution is orphan the frags when en queuing to non work conserving qdisc.&lt;br /&gt;
       But we have have similar issues in other case:&lt;br /&gt;
       - The card has its own priority queues&lt;br /&gt;
       - Host has two interface, one is 1G another is 10G, so throttle 1G may lead traffic over 10G to be throttled.&lt;br /&gt;
       The final solution is to remove receive buffering at tun, and convert it to use NAPI&lt;br /&gt;
       Contact: Jason Wang, MST&lt;br /&gt;
       Reference: https://lkml.org/lkml/2014/1/17/105&lt;br /&gt;
&lt;br /&gt;
* network traffic throttling&lt;br /&gt;
  block implemented &amp;quot;continuous leaky bucket&amp;quot; for throttling&lt;br /&gt;
  we can use continuous leaky bucket to network&lt;br /&gt;
  IOPS/BPS * RX/TX/TOTAL&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* Allocate mac_table dynamically&lt;br /&gt;
&lt;br /&gt;
  In the future, maybe we can allocate the mac_table dynamically instead&lt;br /&gt;
  of embed it in VirtIONet. Then we can just does a pointer swap and&lt;br /&gt;
  gfree() and can save a memcpy() here.&lt;br /&gt;
  Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
    Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
        Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* add documentation for macvlan and macvtap&lt;br /&gt;
   recent docs here:&lt;br /&gt;
   http://backreference.org/2014/03/20/some-notes-on-macvlanmacvtap/&lt;br /&gt;
   need to integrate in iproute and kernel docs.&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
  Search for &amp;quot;Xin Xiaohui: Provide a zero-copy method on KVM virtio-net&amp;quot;&lt;br /&gt;
  for a very old prototype&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
* non-virtio device support with vhost&lt;br /&gt;
  Use vhost interface for guests that don&#039;t use virtio-net&lt;br /&gt;
&lt;br /&gt;
* Extend sndbuf scope to int64&lt;br /&gt;
  Current sndbuf limit is INT_MAX in tap_set_sndbuf(),&lt;br /&gt;
  large values (like 8388607T) can be converted rightly by qapi from qemu commandline,&lt;br /&gt;
  If we want to support the large values, we should extend sndbuf limit from &#039;int&#039; to &#039;int64&#039;&lt;br /&gt;
  Why is this useful?&lt;br /&gt;
  Upstream discussion: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg04192.html&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear ===&lt;br /&gt;
&lt;br /&gt;
* change tcp_tso_should_defer for kvm: batch more&lt;br /&gt;
  aggressively.&lt;br /&gt;
  in particular, see below&lt;br /&gt;
&lt;br /&gt;
* tcp: increase gso buffering for cubic,reno&lt;br /&gt;
    At the moment we push out an skb whenever the limit becomes&lt;br /&gt;
    large enough to send a full-sized TSO skb even if the skb,&lt;br /&gt;
    in fact, is not full-sized.&lt;br /&gt;
    The reason for this seems to be that some congestion avoidance&lt;br /&gt;
    protocols rely on the number of packets in flight to calculate&lt;br /&gt;
    CWND, so if we underuse the available CWND it shrinks&lt;br /&gt;
    which degrades performance:&lt;br /&gt;
    http://www.mail-archive.com/netdev@vger.kernel.org/msg08738.html&lt;br /&gt;
&lt;br /&gt;
    However, there seems to be no reason to do this for&lt;br /&gt;
    protocols such as reno and cubic which don&#039;t rely on packets in flight,&lt;br /&gt;
    and so will simply increase CWND a bit more to compensate for the&lt;br /&gt;
    underuse.&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        kernel part is done (Vlad Yasevich)&lt;br /&gt;
        teach qemu to notify libvirt to enable the filter (still to do) (existed NIC_RX_FILTER_CHANGED event contains vlan-tables)&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* virtio: preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
* vxlan&lt;br /&gt;
  What could we do here?&lt;br /&gt;
&lt;br /&gt;
* bridging without promisc mode with OVS&lt;br /&gt;
&lt;br /&gt;
=== high level issues: not clear what the project is, yet ===&lt;br /&gt;
&lt;br /&gt;
* security: iptables&lt;br /&gt;
At the moment most people disables iptables to get&lt;br /&gt;
good performance on 10G/s networking.&lt;br /&gt;
Any way to improve experience?&lt;br /&gt;
&lt;br /&gt;
* performance&lt;br /&gt;
Going through scheduler and full networking stack twice&lt;br /&gt;
(host+guest) adds a lot of overhead&lt;br /&gt;
Any way to allow bypassing some layers?&lt;br /&gt;
&lt;br /&gt;
* manageability&lt;br /&gt;
Still hard to figure out VM networking,&lt;br /&gt;
VM networking is through libvirt, host networking through NM&lt;br /&gt;
Any way to integrate?&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Write some unit tests for vhost-net/vhost-scsi&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
* Migrate some of the performance regression autotest functionality into Netperf&lt;br /&gt;
  - Get the CPU-utilization of the Host and the other-party, and add them to the report. This is also true for other Host measures, such as vmexits, interrupts, ...&lt;br /&gt;
  - Run Netperf in demo-mode, and measure only the time when all the sessions are active (could be many seconds after the beginning of the tests)&lt;br /&gt;
  - Packaging of Netperf in Fedora / RHEL (exists in Fedora). Licensing could be an issue.&lt;br /&gt;
  - Make the scripts more visible&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=118504</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=118504"/>
		<updated>2014-11-10T11:33:03Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome! ===&lt;br /&gt;
&lt;br /&gt;
* virtio 1.0 support for linux guests&lt;br /&gt;
    required for maintainatibility&lt;br /&gt;
    mid.gmane.org/1414081380-14623-1-git-send-email-mst@redhat.com&lt;br /&gt;
    Developer: MST,Cornelia Huck&lt;br /&gt;
&lt;br /&gt;
* virtio 1.0 support in qemu&lt;br /&gt;
    required for maintainatibility&lt;br /&gt;
    mid.gmane.org/20141024103839.7162b93f.cornelia.huck@de.ibm.com&lt;br /&gt;
    Developer: Cornelia Huck, MST&lt;br /&gt;
&lt;br /&gt;
* improve net polling for cpu overcommit&lt;br /&gt;
    exit busy loop when another process is runnable&lt;br /&gt;
    mid.gmane.org/20140822073653.GA7372@gmail.com&lt;br /&gt;
    mid.gmane.org/1408608310-13579-2-git-send-email-jasowang@redhat.com&lt;br /&gt;
    Developer: Jason Wang, MST&lt;br /&gt;
&lt;br /&gt;
* vhost-net/tun/macvtap cross endian support&lt;br /&gt;
    mid.gmane.org/1414572130-17014-2-git-send-email-clg@fr.ibm.com&lt;br /&gt;
    Developer: Cédric Le Goater, MST&lt;br /&gt;
&lt;br /&gt;
* BQL/aggregation for virtio net&lt;br /&gt;
   dependencies: orphan packets less agressively, enable tx interrupt &lt;br /&gt;
   Developers: MST, Jason&lt;br /&gt;
* orphan packets less agressively (was make pktgen works for virtio-net ( or partially orphan ))&lt;br /&gt;
       virtio-net orphans all skbs during tx, this used to be optimal.&lt;br /&gt;
       Recent changes in guest networking stack and hardware advances&lt;br /&gt;
       such as APICv changed optimal behaviour for drivers.&lt;br /&gt;
       We need to revisit optimizations such as orphaning all packets early&lt;br /&gt;
       to have optimal behaviour.&lt;br /&gt;
&lt;br /&gt;
       this should also fix pktgen which is currently broken with virtio net:&lt;br /&gt;
       orphaning all skbs makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: bring back tx interrupt (partially)&lt;br /&gt;
       Jason&#039;s idea: introduce a flag to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developers: Jason Wang, MST&lt;br /&gt;
&lt;br /&gt;
* enable tx interrupt (conditionally?)&lt;br /&gt;
  Small packet TCP stream performance is not good. This is because virtio-net orphan the packet during ndo_start_xmit() which disable the TCP small packet optimizations like TCP small Queue and AutoCork. The idea is enable the tx interrupt to TCP small packets.&lt;br /&gt;
  Jason&#039;s idea: switch between poll and tx interrupt mode based on recent statistics.&lt;br /&gt;
  MST&#039;s idea: use a per descriptor flag for virtio to force interrupt for a specific packet.&lt;br /&gt;
  Developer: Jason Wang, MST&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-net polling&lt;br /&gt;
      mid.gmane.org/20141029123831.A80F338002D@moren.haifa.ibm.com&lt;br /&gt;
      Developer: Razya Ladelsky&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* support more queues in tun&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     http://mid.gmane.org/1408369040-1216-1-git-send-email-pagupta@redhat.com&lt;br /&gt;
     Developers: Pankaj Gupta, Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Documentation/networking/scaling.txt&lt;br /&gt;
       Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default?&lt;br /&gt;
       depends on: BQL&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* ethtool seftest support for virtio-net&lt;br /&gt;
        Implement selftest ethtool method for virtio-net for regression test e.g the CVEs found for tun/macvtap, qemu and vhost.&lt;br /&gt;
        http://mid.gmane.org/1409881866-14780-1-git-send-email-hjxiaohust@gmail.com&lt;br /&gt;
        Developers: Hengjinxiao,Jason Wang&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc/allmulti mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Done for unicast, but not for multicast.&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan?&lt;br /&gt;
&lt;br /&gt;
* Enable LRO with bridging&lt;br /&gt;
  Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman?&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: Marcel Apfelbaum&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - rx busy polling for virtio-net [DONE]&lt;br /&gt;
    see https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=91815639d8804d1eee7ce2e1f7f60b36771db2c9. 1 byte netperf TCP_RR shows 127% improvement.&lt;br /&gt;
    Future work is co-operate with host, and only does the busy polling when there&#039;s no other process in host cpu. &lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
  Reduce the number of interrupt&lt;br /&gt;
  Rx interrupt coalescing should be good for rx stream throughput.&lt;br /&gt;
  Tx interrupt coalescing will help the optimization of enabling tx interrupt conditionally.&lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* sharing config interrupts&lt;br /&gt;
  Support more devices by sharing a single msi vector&lt;br /&gt;
  between multiple virtio devices.&lt;br /&gt;
  (Applies to virtio-blk too).&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Multi-queue macvtap with real multiple queues&lt;br /&gt;
        Macvtap only provides multiple queues to user in the form of multiple&lt;br /&gt;
        sockets.  As each socket will perform dev_queue_xmit() and we don&#039;t&lt;br /&gt;
        really have multiple real queues on the device, we now have a lock&lt;br /&gt;
        contention.  This contention needs to be addressed.&lt;br /&gt;
        Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* better xmit queueing for tun&lt;br /&gt;
        when guest is slower than host, tun drops packets&lt;br /&gt;
        aggressively. This is because keeping packets on&lt;br /&gt;
        the internal queue does not work well.&lt;br /&gt;
        re-enable functionality to stop queue,&lt;br /&gt;
        probably with some watchdog to help with buggy guests.&lt;br /&gt;
        Developer: MST&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== projects in need of an owner ===&lt;br /&gt;
&lt;br /&gt;
* drop vhostforce&lt;br /&gt;
  it&#039;s an optimization, probbaly not worth it anymore&lt;br /&gt;
&lt;br /&gt;
* feature negotiation for dpdk/vhost user&lt;br /&gt;
  feature negotiation seems to be broken&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-user: clean up protocol&lt;br /&gt;
  address multiple issues in vhost user protocol:&lt;br /&gt;
   missing VHOST_NET_SET_BACKEND&lt;br /&gt;
   make more messages synchronous (with a reply)&lt;br /&gt;
   VHOST_SET_MEM_TABLE, VHOST_SET_VRING_CALL&lt;br /&gt;
    mid.gmane.org/541956B8.1070203@huawei.com&lt;br /&gt;
    mid.gmane.org/54192136.2010409@huawei.com&lt;br /&gt;
   Contact: MST&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Dev watchdog for virtio-net:&lt;br /&gt;
        Implement a watchdog for virtio-net. This will be useful for hunting host bugs early.&lt;br /&gt;
        Contact: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Contact: Razya Ladelsky, Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* DPDK with vhost-user&lt;br /&gt;
  Support vhost-user in addition to vhost net cuse device&lt;br /&gt;
  Contact: Linhaifeng, MST&lt;br /&gt;
&lt;br /&gt;
* DPDK with vhost-net/user: fix offloads&lt;br /&gt;
  DPDK requires disabling offloads ATM,&lt;br /&gt;
  need to fix this.&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* reduce per-device memory allocations&lt;br /&gt;
  vhost device is very large due to need to&lt;br /&gt;
  keep large arrays of iovecs around.&lt;br /&gt;
  we do need large arrays for correctness,&lt;br /&gt;
  but we could move them out of line,&lt;br /&gt;
  and add short inline arrays for typical use-cases.&lt;br /&gt;
  contact: MST&lt;br /&gt;
&lt;br /&gt;
* batch tx completions in vhost&lt;br /&gt;
  vhost already batches up to 64 tx completions for zero copy&lt;br /&gt;
  batch non zero copy as well&lt;br /&gt;
  contact: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* better parallelize small queues&lt;br /&gt;
  don&#039;t wait for ring full to kick.&lt;br /&gt;
  add api to detect ring almost full (e.g. 3/4) and kick&lt;br /&gt;
  depends on: BQL&lt;br /&gt;
  contact: MST&lt;br /&gt;
&lt;br /&gt;
* improve vhost-user unit test&lt;br /&gt;
  support running on machines without hugetlbfs&lt;br /&gt;
  support running with more vm memory layouts&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* tun: fix RX livelock&lt;br /&gt;
        it&#039;s easy for guest to starve out host networking&lt;br /&gt;
        open way to fix this is to use napi &lt;br /&gt;
        Contact: MST&lt;br /&gt;
&lt;br /&gt;
* large-order allocations&lt;br /&gt;
   see 28d6427109d13b0f447cba5761f88d3548e83605&lt;br /&gt;
   contact: MST&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Contact: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  This project seems abandoned?&lt;br /&gt;
  Contact: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level-triggered interrupts&lt;br /&gt;
  aim: enable vhost by default for level interrupts.&lt;br /&gt;
  The benefit is security: we want to avoid using userspace&lt;br /&gt;
  virtio net so that vhost-net is always used.&lt;br /&gt;
&lt;br /&gt;
  Alex emulated (post &amp;amp; re-enable) level-triggered interrupt in KVM for&lt;br /&gt;
  skipping userspace. VFIO already enjoied the performance benefit,&lt;br /&gt;
  let&#039;s do it for virtio-pci. Current virtio-pci devices still use&lt;br /&gt;
  level-interrupt in userspace.&lt;br /&gt;
  see: kernel:&lt;br /&gt;
  7a84428af [PATCH] KVM: Add resampling irqfds for level triggered interrupts&lt;br /&gt;
 qemu:&lt;br /&gt;
  68919cac [PATCH] hw/vfio: set interrupts using pci irq wrappers&lt;br /&gt;
           (virtio-pci didn&#039;t use the wrappers)&lt;br /&gt;
  e1d1e586 [PATCH] vfio-pci: Add KVM INTx acceleration&lt;br /&gt;
&lt;br /&gt;
  Contact: Amos Kong, MST       &lt;br /&gt;
&lt;br /&gt;
* Head of line blocking issue with zerocopy&lt;br /&gt;
       zerocopy has several defects that will cause head of line blocking problem:&lt;br /&gt;
       - limit the number of pending DMAs&lt;br /&gt;
       - complete in order&lt;br /&gt;
       This means is one of some of the DMAs were delayed, all other will also delayed. This could be reproduced with following case:&lt;br /&gt;
       - boot two VMS VM1(tap1) and VM2(tap2) on host1 (has eth0)&lt;br /&gt;
       - setup tbf to limit the tap2 bandwidth to 10Mbit/s&lt;br /&gt;
       - start two netperf instances one from VM1 to VM2, another from VM1 to an external host whose traffic go through eth0 on host&lt;br /&gt;
       Then you can see not only VM1 to VM2 is throttled, but also VM1 to external host were also throttled.&lt;br /&gt;
       For this issue, a solution is orphan the frags when en queuing to non work conserving qdisc.&lt;br /&gt;
       But we have have similar issues in other case:&lt;br /&gt;
       - The card has its own priority queues&lt;br /&gt;
       - Host has two interface, one is 1G another is 10G, so throttle 1G may lead traffic over 10G to be throttled.&lt;br /&gt;
       The final solution is to remove receive buffering at tun, and convert it to use NAPI&lt;br /&gt;
       Contact: Jason Wang, MST&lt;br /&gt;
       Reference: https://lkml.org/lkml/2014/1/17/105&lt;br /&gt;
&lt;br /&gt;
* network traffic throttling&lt;br /&gt;
  block implemented &amp;quot;continuous leaky bucket&amp;quot; for throttling&lt;br /&gt;
  we can use continuous leaky bucket to network&lt;br /&gt;
  IOPS/BPS * RX/TX/TOTAL&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* Allocate mac_table dynamically&lt;br /&gt;
&lt;br /&gt;
  In the future, maybe we can allocate the mac_table dynamically instead&lt;br /&gt;
  of embed it in VirtIONet. Then we can just does a pointer swap and&lt;br /&gt;
  gfree() and can save a memcpy() here.&lt;br /&gt;
  Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
    Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
        Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* add documentation for macvlan and macvtap&lt;br /&gt;
   recent docs here:&lt;br /&gt;
   http://backreference.org/2014/03/20/some-notes-on-macvlanmacvtap/&lt;br /&gt;
   need to integrate in iproute and kernel docs.&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
  Search for &amp;quot;Xin Xiaohui: Provide a zero-copy method on KVM virtio-net&amp;quot;&lt;br /&gt;
  for a very old prototype&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
* non-virtio device support with vhost&lt;br /&gt;
  Use vhost interface for guests that don&#039;t use virtio-net&lt;br /&gt;
&lt;br /&gt;
* Extend sndbuf scope to int64&lt;br /&gt;
  Current sndbuf limit is INT_MAX in tap_set_sndbuf(),&lt;br /&gt;
  large values (like 8388607T) can be converted rightly by qapi from qemu commandline,&lt;br /&gt;
  If we want to support the large values, we should extend sndbuf limit from &#039;int&#039; to &#039;int64&#039;&lt;br /&gt;
  Why is this useful?&lt;br /&gt;
  Upstream discussion: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg04192.html&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear ===&lt;br /&gt;
&lt;br /&gt;
* change tcp_tso_should_defer for kvm: batch more&lt;br /&gt;
  aggressively.&lt;br /&gt;
  in particular, see below&lt;br /&gt;
&lt;br /&gt;
* tcp: increase gso buffering for cubic,reno&lt;br /&gt;
    At the moment we push out an skb whenever the limit becomes&lt;br /&gt;
    large enough to send a full-sized TSO skb even if the skb,&lt;br /&gt;
    in fact, is not full-sized.&lt;br /&gt;
    The reason for this seems to be that some congestion avoidance&lt;br /&gt;
    protocols rely on the number of packets in flight to calculate&lt;br /&gt;
    CWND, so if we underuse the available CWND it shrinks&lt;br /&gt;
    which degrades performance:&lt;br /&gt;
    http://www.mail-archive.com/netdev@vger.kernel.org/msg08738.html&lt;br /&gt;
&lt;br /&gt;
    However, there seems to be no reason to do this for&lt;br /&gt;
    protocols such as reno and cubic which don&#039;t rely on packets in flight,&lt;br /&gt;
    and so will simply increase CWND a bit more to compensate for the&lt;br /&gt;
    underuse.&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        kernel part is done (Vlad Yasevich)&lt;br /&gt;
        teach qemu to notify libvirt to enable the filter (still to do) (existed NIC_RX_FILTER_CHANGED event contains vlan-tables)&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* virtio: preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
* vxlan&lt;br /&gt;
  What could we do here?&lt;br /&gt;
&lt;br /&gt;
* bridging without promisc mode with OVS&lt;br /&gt;
&lt;br /&gt;
=== high level issues: not clear what the project is, yet ===&lt;br /&gt;
&lt;br /&gt;
* security: iptables&lt;br /&gt;
At the moment most people disables iptables to get&lt;br /&gt;
good performance on 10G/s networking.&lt;br /&gt;
Any way to improve experience?&lt;br /&gt;
&lt;br /&gt;
* performance&lt;br /&gt;
Going through scheduler and full networking stack twice&lt;br /&gt;
(host+guest) adds a lot of overhead&lt;br /&gt;
Any way to allow bypassing some layers?&lt;br /&gt;
&lt;br /&gt;
* manageability&lt;br /&gt;
Still hard to figure out VM networking,&lt;br /&gt;
VM networking is through libvirt, host networking through NM&lt;br /&gt;
Any way to integrate?&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Write some unit tests for vhost-net/vhost-scsi&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
* Migrate some of the performance regression autotest functionality into Netperf&lt;br /&gt;
  - Get the CPU-utilization of the Host and the other-party, and add them to the report. This is also true for other Host measures, such as vmexits, interrupts, ...&lt;br /&gt;
  - Run Netperf in demo-mode, and measure only the time when all the sessions are active (could be many seconds after the beginning of the tests)&lt;br /&gt;
  - Packaging of Netperf in Fedora / RHEL (exists in Fedora). Licensing could be an issue.&lt;br /&gt;
  - Make the scripts more visible&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=118503</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=118503"/>
		<updated>2014-11-10T11:29:38Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome! ===&lt;br /&gt;
&lt;br /&gt;
* virtio 1.0 support for linux guests&lt;br /&gt;
    required for maintainatibility&lt;br /&gt;
    mid.gmane.org/1414081380-14623-1-git-send-email-mst@redhat.com&lt;br /&gt;
    Developer: MST,Cornelia Huck&lt;br /&gt;
&lt;br /&gt;
* virtio 1.0 support in qemu&lt;br /&gt;
    required for maintainatibility&lt;br /&gt;
    mid.gmane.org/20141024103839.7162b93f.cornelia.huck@de.ibm.com&lt;br /&gt;
    Developer: Cornelia Huck, MST&lt;br /&gt;
&lt;br /&gt;
* improve net polling for cpu overcommit&lt;br /&gt;
    exit busy loop when another process is runnable&lt;br /&gt;
    mid.gmane.org/20140822073653.GA7372@gmail.com&lt;br /&gt;
    mid.gmane.org/1408608310-13579-2-git-send-email-jasowang@redhat.com&lt;br /&gt;
    Developer: Jason Wang, MST&lt;br /&gt;
&lt;br /&gt;
* vhost-net/tun/macvtap cross endian support&lt;br /&gt;
    mid.gmane.org/1414572130-17014-2-git-send-email-clg@fr.ibm.com&lt;br /&gt;
    Developer: Cédric Le Goater, MST&lt;br /&gt;
&lt;br /&gt;
* BQL/aggregation for virtio net&lt;br /&gt;
   dependencies: orphan packets less agressively, enable tx interrupt &lt;br /&gt;
   Developers: MST, Jason&lt;br /&gt;
* orphan packets less agressively (was make pktgen works for virtio-net ( or partially orphan ))&lt;br /&gt;
       virtio-net orphans all skbs during tx, this used to be optimal.&lt;br /&gt;
       Recent changes in guest networking stack and hardware advances&lt;br /&gt;
       such as APICv changed optimal behaviour for drivers.&lt;br /&gt;
       We need to revisit optimizations such as orphaning all packets early&lt;br /&gt;
       to have optimal behaviour.&lt;br /&gt;
&lt;br /&gt;
       this should also fix pktgen which is currently broken with virtio net:&lt;br /&gt;
       orphaning all skbs makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: bring back tx interrupt (partially)&lt;br /&gt;
       Jason&#039;s idea: introduce a flag to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developers: Jason Wang, MST&lt;br /&gt;
&lt;br /&gt;
* enable tx interrupt (conditionally?)&lt;br /&gt;
  Small packet TCP stream performance is not good. This is because virtio-net orphan the packet during ndo_start_xmit() which disable the TCP small packet optimizations like TCP small Queue and AutoCork. The idea is enable the tx interrupt to TCP small packets.&lt;br /&gt;
  Jason&#039;s idea: switch between poll and tx interrupt mode based on recent statistics.&lt;br /&gt;
  MST&#039;s idea: use a per descriptor flag for virtio to force interrupt for a specific packet.&lt;br /&gt;
  Developer: Jason Wang, MST&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-net polling&lt;br /&gt;
      mid.gmane.org/20141029123831.A80F338002D@moren.haifa.ibm.com&lt;br /&gt;
      Developer: Razya Ladelsky&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* support more queues in tun&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     http://mid.gmane.org/1408369040-1216-1-git-send-email-pagupta@redhat.com&lt;br /&gt;
     Developers: Pankaj Gupta, Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Documentation/networking/scaling.txt&lt;br /&gt;
       Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default?&lt;br /&gt;
       depends on: BQL&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* ethtool seftest support for virtio-net&lt;br /&gt;
        Implement selftest ethtool method for virtio-net for regression test e.g the CVEs found for tun/macvtap, qemu and vhost.&lt;br /&gt;
        http://mid.gmane.org/1409881866-14780-1-git-send-email-hjxiaohust@gmail.com&lt;br /&gt;
        Developers: Hengjinxiao,Jason Wang&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc/allmulti mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Done for unicast, but not for multicast.&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan?&lt;br /&gt;
&lt;br /&gt;
* Enable LRO with bridging&lt;br /&gt;
  Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman?&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: Marcel Apfelbaum&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - rx busy polling for virtio-net [DONE]&lt;br /&gt;
    see https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=91815639d8804d1eee7ce2e1f7f60b36771db2c9. 1 byte netperf TCP_RR shows 127% improvement.&lt;br /&gt;
    Future work is co-operate with host, and only does the busy polling when there&#039;s no other process in host cpu. &lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
  Reduce the number of interrupt&lt;br /&gt;
  Rx interrupt coalescing should be good for rx stream throughput.&lt;br /&gt;
  Tx interrupt coalescing will help the optimization of enabling tx interrupt conditionally.&lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* sharing config interrupts&lt;br /&gt;
  Support more devices by sharing a single msi vector&lt;br /&gt;
  between multiple virtio devices.&lt;br /&gt;
  (Applies to virtio-blk too).&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Multi-queue macvtap with real multiple queues&lt;br /&gt;
        Macvtap only provides multiple queues to user in the form of multiple&lt;br /&gt;
        sockets.  As each socket will perform dev_queue_xmit() and we don&#039;t&lt;br /&gt;
        really have multiple real queues on the device, we now have a lock&lt;br /&gt;
        contention.  This contention needs to be addressed.&lt;br /&gt;
        Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* better xmit queueing for tun&lt;br /&gt;
        when guest is slower than host, tun drops packets&lt;br /&gt;
        aggressively. This is because keeping packets on&lt;br /&gt;
        the internal queue does not work well.&lt;br /&gt;
        re-enable functionality to stop queue,&lt;br /&gt;
        probably with some watchdog to help with buggy guests.&lt;br /&gt;
        Developer: MST&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== projects in need of an owner ===&lt;br /&gt;
&lt;br /&gt;
* vhost-user: clean up protocol&lt;br /&gt;
  address multiple issues in vhost user protocol:&lt;br /&gt;
   missing VHOST_NET_SET_BACKEND&lt;br /&gt;
   make more messages synchronous (with a reply)&lt;br /&gt;
   VHOST_SET_MEM_TABLE, VHOST_SET_VRING_CALL&lt;br /&gt;
    mid.gmane.org/541956B8.1070203@huawei.com&lt;br /&gt;
    mid.gmane.org/54192136.2010409@huawei.com&lt;br /&gt;
   Contact: MST&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Dev watchdog for virtio-net:&lt;br /&gt;
        Implement a watchdog for virtio-net. This will be useful for hunting host bugs early.&lt;br /&gt;
        Contact: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Contact: Razya Ladelsky, Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* DPDK with vhost-user&lt;br /&gt;
  Support vhost-user in addition to vhost net cuse device&lt;br /&gt;
  Contact: Linhaifeng, MST&lt;br /&gt;
&lt;br /&gt;
* DPDK with vhost-net/user: fix offloads&lt;br /&gt;
  DPDK requires disabling offloads ATM,&lt;br /&gt;
  need to fix this.&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* reduce per-device memory allocations&lt;br /&gt;
  vhost device is very large due to need to&lt;br /&gt;
  keep large arrays of iovecs around.&lt;br /&gt;
  we do need large arrays for correctness,&lt;br /&gt;
  but we could move them out of line,&lt;br /&gt;
  and add short inline arrays for typical use-cases.&lt;br /&gt;
  contact: MST&lt;br /&gt;
&lt;br /&gt;
* batch tx completions in vhost&lt;br /&gt;
  vhost already batches up to 64 tx completions for zero copy&lt;br /&gt;
  batch non zero copy as well&lt;br /&gt;
  contact: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* better parallelize small queues&lt;br /&gt;
  don&#039;t wait for ring full to kick.&lt;br /&gt;
  add api to detect ring almost full (e.g. 3/4) and kick&lt;br /&gt;
  depends on: BQL&lt;br /&gt;
  contact: MST&lt;br /&gt;
&lt;br /&gt;
* improve vhost-user unit test&lt;br /&gt;
  support running on machines without hugetlbfs&lt;br /&gt;
  support running with more vm memory layouts&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* tun: fix RX livelock&lt;br /&gt;
        it&#039;s easy for guest to starve out host networking&lt;br /&gt;
        open way to fix this is to use napi &lt;br /&gt;
        Contact: MST&lt;br /&gt;
&lt;br /&gt;
* large-order allocations&lt;br /&gt;
   see 28d6427109d13b0f447cba5761f88d3548e83605&lt;br /&gt;
   contact: MST&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Contact: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  This project seems abandoned?&lt;br /&gt;
  Contact: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level-triggered interrupts&lt;br /&gt;
  aim: enable vhost by default for level interrupts.&lt;br /&gt;
  The benefit is security: we want to avoid using userspace&lt;br /&gt;
  virtio net so that vhost-net is always used.&lt;br /&gt;
&lt;br /&gt;
  Alex emulated (post &amp;amp; re-enable) level-triggered interrupt in KVM for&lt;br /&gt;
  skipping userspace. VFIO already enjoied the performance benefit,&lt;br /&gt;
  let&#039;s do it for virtio-pci. Current virtio-pci devices still use&lt;br /&gt;
  level-interrupt in userspace.&lt;br /&gt;
  see: kernel:&lt;br /&gt;
  7a84428af [PATCH] KVM: Add resampling irqfds for level triggered interrupts&lt;br /&gt;
 qemu:&lt;br /&gt;
  68919cac [PATCH] hw/vfio: set interrupts using pci irq wrappers&lt;br /&gt;
           (virtio-pci didn&#039;t use the wrappers)&lt;br /&gt;
  e1d1e586 [PATCH] vfio-pci: Add KVM INTx acceleration&lt;br /&gt;
&lt;br /&gt;
  Contact: Amos Kong, MST       &lt;br /&gt;
&lt;br /&gt;
* Head of line blocking issue with zerocopy&lt;br /&gt;
       zerocopy has several defects that will cause head of line blocking problem:&lt;br /&gt;
       - limit the number of pending DMAs&lt;br /&gt;
       - complete in order&lt;br /&gt;
       This means is one of some of the DMAs were delayed, all other will also delayed. This could be reproduced with following case:&lt;br /&gt;
       - boot two VMS VM1(tap1) and VM2(tap2) on host1 (has eth0)&lt;br /&gt;
       - setup tbf to limit the tap2 bandwidth to 10Mbit/s&lt;br /&gt;
       - start two netperf instances one from VM1 to VM2, another from VM1 to an external host whose traffic go through eth0 on host&lt;br /&gt;
       Then you can see not only VM1 to VM2 is throttled, but also VM1 to external host were also throttled.&lt;br /&gt;
       For this issue, a solution is orphan the frags when en queuing to non work conserving qdisc.&lt;br /&gt;
       But we have have similar issues in other case:&lt;br /&gt;
       - The card has its own priority queues&lt;br /&gt;
       - Host has two interface, one is 1G another is 10G, so throttle 1G may lead traffic over 10G to be throttled.&lt;br /&gt;
       The final solution is to remove receive buffering at tun, and convert it to use NAPI&lt;br /&gt;
       Contact: Jason Wang, MST&lt;br /&gt;
       Reference: https://lkml.org/lkml/2014/1/17/105&lt;br /&gt;
&lt;br /&gt;
* network traffic throttling&lt;br /&gt;
  block implemented &amp;quot;continuous leaky bucket&amp;quot; for throttling&lt;br /&gt;
  we can use continuous leaky bucket to network&lt;br /&gt;
  IOPS/BPS * RX/TX/TOTAL&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* Allocate mac_table dynamically&lt;br /&gt;
&lt;br /&gt;
  In the future, maybe we can allocate the mac_table dynamically instead&lt;br /&gt;
  of embed it in VirtIONet. Then we can just does a pointer swap and&lt;br /&gt;
  gfree() and can save a memcpy() here.&lt;br /&gt;
  Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
    Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
        Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* add documentation for macvlan and macvtap&lt;br /&gt;
   recent docs here:&lt;br /&gt;
   http://backreference.org/2014/03/20/some-notes-on-macvlanmacvtap/&lt;br /&gt;
   need to integrate in iproute and kernel docs.&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
  Search for &amp;quot;Xin Xiaohui: Provide a zero-copy method on KVM virtio-net&amp;quot;&lt;br /&gt;
  for a very old prototype&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
* non-virtio device support with vhost&lt;br /&gt;
  Use vhost interface for guests that don&#039;t use virtio-net&lt;br /&gt;
&lt;br /&gt;
* Extend sndbuf scope to int64&lt;br /&gt;
  Current sndbuf limit is INT_MAX in tap_set_sndbuf(),&lt;br /&gt;
  large values (like 8388607T) can be converted rightly by qapi from qemu commandline,&lt;br /&gt;
  If we want to support the large values, we should extend sndbuf limit from &#039;int&#039; to &#039;int64&#039;&lt;br /&gt;
  Why is this useful?&lt;br /&gt;
  Upstream discussion: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg04192.html&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear ===&lt;br /&gt;
&lt;br /&gt;
* change tcp_tso_should_defer for kvm: batch more&lt;br /&gt;
  aggressively.&lt;br /&gt;
  in particular, see below&lt;br /&gt;
&lt;br /&gt;
* tcp: increase gso buffering for cubic,reno&lt;br /&gt;
    At the moment we push out an skb whenever the limit becomes&lt;br /&gt;
    large enough to send a full-sized TSO skb even if the skb,&lt;br /&gt;
    in fact, is not full-sized.&lt;br /&gt;
    The reason for this seems to be that some congestion avoidance&lt;br /&gt;
    protocols rely on the number of packets in flight to calculate&lt;br /&gt;
    CWND, so if we underuse the available CWND it shrinks&lt;br /&gt;
    which degrades performance:&lt;br /&gt;
    http://www.mail-archive.com/netdev@vger.kernel.org/msg08738.html&lt;br /&gt;
&lt;br /&gt;
    However, there seems to be no reason to do this for&lt;br /&gt;
    protocols such as reno and cubic which don&#039;t rely on packets in flight,&lt;br /&gt;
    and so will simply increase CWND a bit more to compensate for the&lt;br /&gt;
    underuse.&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        kernel part is done (Vlad Yasevich)&lt;br /&gt;
        teach qemu to notify libvirt to enable the filter (still to do) (existed NIC_RX_FILTER_CHANGED event contains vlan-tables)&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* virtio: preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
* vxlan&lt;br /&gt;
  What could we do here?&lt;br /&gt;
&lt;br /&gt;
* bridging without promisc mode with OVS&lt;br /&gt;
&lt;br /&gt;
=== high level issues: not clear what the project is, yet ===&lt;br /&gt;
&lt;br /&gt;
* security: iptables&lt;br /&gt;
At the moment most people disables iptables to get&lt;br /&gt;
good performance on 10G/s networking.&lt;br /&gt;
Any way to improve experience?&lt;br /&gt;
&lt;br /&gt;
* performance&lt;br /&gt;
Going through scheduler and full networking stack twice&lt;br /&gt;
(host+guest) adds a lot of overhead&lt;br /&gt;
Any way to allow bypassing some layers?&lt;br /&gt;
&lt;br /&gt;
* manageability&lt;br /&gt;
Still hard to figure out VM networking,&lt;br /&gt;
VM networking is through libvirt, host networking through NM&lt;br /&gt;
Any way to integrate?&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Write some unit tests for vhost-net/vhost-scsi&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
* Migrate some of the performance regression autotest functionality into Netperf&lt;br /&gt;
  - Get the CPU-utilization of the Host and the other-party, and add them to the report. This is also true for other Host measures, such as vmexits, interrupts, ...&lt;br /&gt;
  - Run Netperf in demo-mode, and measure only the time when all the sessions are active (could be many seconds after the beginning of the tests)&lt;br /&gt;
  - Packaging of Netperf in Fedora / RHEL (exists in Fedora). Licensing could be an issue.&lt;br /&gt;
  - Make the scripts more visible&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=118502</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=118502"/>
		<updated>2014-11-10T11:13:11Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome! ===&lt;br /&gt;
&lt;br /&gt;
* virtio 1.0 support for linux guests&lt;br /&gt;
    required for maintainatibility&lt;br /&gt;
    mid.gmane.org/1414081380-14623-1-git-send-email-mst@redhat.com&lt;br /&gt;
    Developer: MST,Cornelia Huck&lt;br /&gt;
&lt;br /&gt;
* virtio 1.0 support in qemu&lt;br /&gt;
    required for maintainatibility&lt;br /&gt;
    mid.gmane.org/20141024103839.7162b93f.cornelia.huck@de.ibm.com&lt;br /&gt;
    Developer: Cornelia Huck, MST&lt;br /&gt;
&lt;br /&gt;
* improve net polling for cpu overcommit&lt;br /&gt;
    exit busy loop when another process is runnable&lt;br /&gt;
    mid.gmane.org/20140822073653.GA7372@gmail.com&lt;br /&gt;
    mid.gmane.org/1408608310-13579-2-git-send-email-jasowang@redhat.com&lt;br /&gt;
    Developer: Jason Wang, MST&lt;br /&gt;
&lt;br /&gt;
* vhost-net/tun/macvtap cross endian support&lt;br /&gt;
    mid.gmane.org/1414572130-17014-2-git-send-email-clg@fr.ibm.com&lt;br /&gt;
    Developer: Cédric Le Goater, MST&lt;br /&gt;
&lt;br /&gt;
* BQL/aggregation for virtio net&lt;br /&gt;
   dependencies: orphan packets less agressively, enable tx interrupt &lt;br /&gt;
   Developers: MST, Jason&lt;br /&gt;
* orphan packets less agressively (was make pktgen works for virtio-net ( or partially orphan ))&lt;br /&gt;
       virtio-net orphans all skbs during tx, this used to be optimal.&lt;br /&gt;
       Recent changes in guest networking stack and hardware advances&lt;br /&gt;
       such as APICv changed optimal behaviour for drivers.&lt;br /&gt;
       We need to revisit optimizations such as orphaning all packets early&lt;br /&gt;
       to have optimal behaviour.&lt;br /&gt;
&lt;br /&gt;
       this should also fix pktgen which is currently broken with virtio net:&lt;br /&gt;
       orphaning all skbs makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: bring back tx interrupt (partially)&lt;br /&gt;
       Jason&#039;s idea: introduce a flag to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developers: Jason Wang, MST&lt;br /&gt;
&lt;br /&gt;
* enable tx interrupt (conditionally?)&lt;br /&gt;
  Small packet TCP stream performance is not good. This is because virtio-net orphan the packet during ndo_start_xmit() which disable the TCP small packet optimizations like TCP small Queue and AutoCork. The idea is enable the tx interrupt to TCP small packets.&lt;br /&gt;
  Jason&#039;s idea: switch between poll and tx interrupt mode based on recent statistics.&lt;br /&gt;
  MST&#039;s idea: use a per descriptor flag for virtio to force interrupt for a specific packet.&lt;br /&gt;
  Developer: Jason Wang, MST&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-net polling&lt;br /&gt;
      mid.gmane.org/20141029123831.A80F338002D@moren.haifa.ibm.com&lt;br /&gt;
      Developer: Razya Ladelsky&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Razya Ladelsky, Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* support more queues in tun&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     http://mid.gmane.org/1408369040-1216-1-git-send-email-pagupta@redhat.com&lt;br /&gt;
     Developers: Pankaj Gupta, Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Documentation/networking/scaling.txt&lt;br /&gt;
       Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default.&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* ethtool seftest support for virtio-net&lt;br /&gt;
        Implement selftest ethtool method for virtio-net for regression test e.g the CVEs found for tun/macvtap, qemu and vhost.&lt;br /&gt;
        mid.gmane.org/1409881866-14780-1-git-send-email-hjxiaohust@gmail.com&lt;br /&gt;
        Developers: Hengjinxiao,Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Dev watchdog for virtio-net:&lt;br /&gt;
        Implement a watchdog for virtio-net. This will be useful for hunting host bugs early.&lt;br /&gt;
        Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc/allmulti mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Done for unicast, but not for multicast.&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-user: clean up protocol&lt;br /&gt;
  address multiple issues in vhost user protocol:&lt;br /&gt;
   missing VHOST_NET_SET_BACKEND&lt;br /&gt;
   make more messages synchronous (with a reply)&lt;br /&gt;
   VHOST_SET_MEM_TABLE, VHOST_SET_VRING_CALL&lt;br /&gt;
  mid.gmane.org/541956B8.1070203@huawei.com&lt;br /&gt;
  mid.gmane.org/54192136.2010409@huawei.com&lt;br /&gt;
   Developer: MST?&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan?&lt;br /&gt;
&lt;br /&gt;
* Enable LRO with bridging&lt;br /&gt;
  Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman?&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: Marcel Apfelbaum&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - rx busy polling for virtio-net [DONE]&lt;br /&gt;
    see https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=91815639d8804d1eee7ce2e1f7f60b36771db2c9. 1 byte netperf TCP_RR shows 127% improvement.&lt;br /&gt;
    Future work is co-operate with host, and only does the busy polling when there&#039;s no other process in host cpu. &lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
  Reduce the number of interrupt&lt;br /&gt;
  Rx interrupt coalescing should be good for rx stream throughput.&lt;br /&gt;
  Tx interrupt coalescing will help the optimization of enabling tx interrupt conditionally.&lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* sharing config interrupts&lt;br /&gt;
  Support more devices by sharing a single msi vector&lt;br /&gt;
  between multiple virtio devices.&lt;br /&gt;
  (Applies to virtio-blk too).&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Multi-queue macvtap with real multiple queues&lt;br /&gt;
        Macvtap only provides multiple queues to user in the form of multiple&lt;br /&gt;
        sockets.  As each socket will perform dev_queue_xmit() and we don&#039;t&lt;br /&gt;
        really have multiple real queues on the device, we now have a lock&lt;br /&gt;
        contention.  This contention needs to be addressed.&lt;br /&gt;
        Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* better xmit queueing for tun&lt;br /&gt;
        when guest is slower than host, tun drops packets&lt;br /&gt;
        aggressively. This is because keeping packets on&lt;br /&gt;
        the internal queue does not work well.&lt;br /&gt;
        re-enable functionality to stop queue,&lt;br /&gt;
        probably with some watchdog to help with buggy guests.&lt;br /&gt;
        Developer: MST&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== projects in need of an owner ===&lt;br /&gt;
&lt;br /&gt;
* DPDK with vhost-user&lt;br /&gt;
  Support vhost-user in addition to vhost net cuse device&lt;br /&gt;
  Contact: Linhaifeng, MST&lt;br /&gt;
&lt;br /&gt;
* DPDK with vhost-net/user: fix offloads&lt;br /&gt;
  DPDK requires disabling offloads ATM,&lt;br /&gt;
  need to fix this.&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* reduce per-device memory allocations&lt;br /&gt;
  vhost device is very large due to need to&lt;br /&gt;
  keep large arrays of iovecs around.&lt;br /&gt;
  we do need large arrays for correctness,&lt;br /&gt;
  but we could move them out of line,&lt;br /&gt;
  and add short inline arrays for typical use-cases.&lt;br /&gt;
  contact: MST&lt;br /&gt;
&lt;br /&gt;
* batch tx completions in vhost&lt;br /&gt;
  vhost already batches up to 64 tx completions for zero copy&lt;br /&gt;
  batch non zero copy as well&lt;br /&gt;
  contact: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* better parallelize small queues&lt;br /&gt;
  don&#039;t wait for ring full to kick.&lt;br /&gt;
  add api to detect ring almost full (e.g. 3/4) and kick&lt;br /&gt;
  depends on: BQL&lt;br /&gt;
  contact: MST&lt;br /&gt;
&lt;br /&gt;
* improve vhost-user unit test&lt;br /&gt;
  support running on machines without hugetlbfs&lt;br /&gt;
  support running with more vm memory layouts&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* tun: fix RX livelock&lt;br /&gt;
        it&#039;s easy for guest to starve out host networking&lt;br /&gt;
        open way to fix this is to use napi &lt;br /&gt;
        Contact: MST&lt;br /&gt;
&lt;br /&gt;
* large-order allocations&lt;br /&gt;
   see 28d6427109d13b0f447cba5761f88d3548e83605&lt;br /&gt;
   contact: MST&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Contact: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  This project seems abandoned?&lt;br /&gt;
  Contact: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level-triggered interrupts&lt;br /&gt;
  aim: enable vhost by default for level interrupts.&lt;br /&gt;
  The benefit is security: we want to avoid using userspace&lt;br /&gt;
  virtio net so that vhost-net is always used.&lt;br /&gt;
&lt;br /&gt;
  Alex emulated (post &amp;amp; re-enable) level-triggered interrupt in KVM for&lt;br /&gt;
  skipping userspace. VFIO already enjoied the performance benefit,&lt;br /&gt;
  let&#039;s do it for virtio-pci. Current virtio-pci devices still use&lt;br /&gt;
  level-interrupt in userspace.&lt;br /&gt;
  see: kernel:&lt;br /&gt;
  7a84428af [PATCH] KVM: Add resampling irqfds for level triggered interrupts&lt;br /&gt;
 qemu:&lt;br /&gt;
  68919cac [PATCH] hw/vfio: set interrupts using pci irq wrappers&lt;br /&gt;
           (virtio-pci didn&#039;t use the wrappers)&lt;br /&gt;
  e1d1e586 [PATCH] vfio-pci: Add KVM INTx acceleration&lt;br /&gt;
&lt;br /&gt;
  Contact: Amos Kong, MST       &lt;br /&gt;
&lt;br /&gt;
* Head of line blocking issue with zerocopy&lt;br /&gt;
       zerocopy has several defects that will cause head of line blocking problem:&lt;br /&gt;
       - limit the number of pending DMAs&lt;br /&gt;
       - complete in order&lt;br /&gt;
       This means is one of some of the DMAs were delayed, all other will also delayed. This could be reproduced with following case:&lt;br /&gt;
       - boot two VMS VM1(tap1) and VM2(tap2) on host1 (has eth0)&lt;br /&gt;
       - setup tbf to limit the tap2 bandwidth to 10Mbit/s&lt;br /&gt;
       - start two netperf instances one from VM1 to VM2, another from VM1 to an external host whose traffic go through eth0 on host&lt;br /&gt;
       Then you can see not only VM1 to VM2 is throttled, but also VM1 to external host were also throttled.&lt;br /&gt;
       For this issue, a solution is orphan the frags when en queuing to non work conserving qdisc.&lt;br /&gt;
       But we have have similar issues in other case:&lt;br /&gt;
       - The card has its own priority queues&lt;br /&gt;
       - Host has two interface, one is 1G another is 10G, so throttle 1G may lead traffic over 10G to be throttled.&lt;br /&gt;
       The final solution is to remove receive buffering at tun, and convert it to use NAPI&lt;br /&gt;
       Contact: Jason Wang, MST&lt;br /&gt;
       Reference: https://lkml.org/lkml/2014/1/17/105&lt;br /&gt;
&lt;br /&gt;
* network traffic throttling&lt;br /&gt;
  block implemented &amp;quot;continuous leaky bucket&amp;quot; for throttling&lt;br /&gt;
  we can use continuous leaky bucket to network&lt;br /&gt;
  IOPS/BPS * RX/TX/TOTAL&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* Allocate mac_table dynamically&lt;br /&gt;
&lt;br /&gt;
  In the future, maybe we can allocate the mac_table dynamically instead&lt;br /&gt;
  of embed it in VirtIONet. Then we can just does a pointer swap and&lt;br /&gt;
  gfree() and can save a memcpy() here.&lt;br /&gt;
  Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
    Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
        Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* add documentation for macvlan and macvtap&lt;br /&gt;
   recent docs here:&lt;br /&gt;
   http://backreference.org/2014/03/20/some-notes-on-macvlanmacvtap/&lt;br /&gt;
   need to integrate in iproute and kernel docs.&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
  Search for &amp;quot;Xin Xiaohui: Provide a zero-copy method on KVM virtio-net&amp;quot;&lt;br /&gt;
  for a very old prototype&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
* non-virtio device support with vhost&lt;br /&gt;
  Use vhost interface for guests that don&#039;t use virtio-net&lt;br /&gt;
&lt;br /&gt;
* Extend sndbuf scope to int64&lt;br /&gt;
  Current sndbuf limit is INT_MAX in tap_set_sndbuf(),&lt;br /&gt;
  large values (like 8388607T) can be converted rightly by qapi from qemu commandline,&lt;br /&gt;
  If we want to support the large values, we should extend sndbuf limit from &#039;int&#039; to &#039;int64&#039;&lt;br /&gt;
  Why is this useful?&lt;br /&gt;
  Upstream discussion: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg04192.html&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear ===&lt;br /&gt;
&lt;br /&gt;
* change tcp_tso_should_defer for kvm: batch more&lt;br /&gt;
  aggressively.&lt;br /&gt;
  in particular, see below&lt;br /&gt;
&lt;br /&gt;
* tcp: increase gso buffering for cubic,reno&lt;br /&gt;
    At the moment we push out an skb whenever the limit becomes&lt;br /&gt;
    large enough to send a full-sized TSO skb even if the skb,&lt;br /&gt;
    in fact, is not full-sized.&lt;br /&gt;
    The reason for this seems to be that some congestion avoidance&lt;br /&gt;
    protocols rely on the number of packets in flight to calculate&lt;br /&gt;
    CWND, so if we underuse the available CWND it shrinks&lt;br /&gt;
    which degrades performance:&lt;br /&gt;
    http://www.mail-archive.com/netdev@vger.kernel.org/msg08738.html&lt;br /&gt;
&lt;br /&gt;
    However, there seems to be no reason to do this for&lt;br /&gt;
    protocols such as reno and cubic which don&#039;t rely on packets in flight,&lt;br /&gt;
    and so will simply increase CWND a bit more to compensate for the&lt;br /&gt;
    underuse.&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        kernel part is done (Vlad Yasevich)&lt;br /&gt;
        teach qemu to notify libvirt to enable the filter (still to do) (existed NIC_RX_FILTER_CHANGED event contains vlan-tables)&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* virtio: preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
* vxlan&lt;br /&gt;
  What could we do here?&lt;br /&gt;
&lt;br /&gt;
* bridging without promisc mode with OVS&lt;br /&gt;
&lt;br /&gt;
=== high level issues: not clear what the project is, yet ===&lt;br /&gt;
&lt;br /&gt;
* security: iptables&lt;br /&gt;
At the moment most people disables iptables to get&lt;br /&gt;
good performance on 10G/s networking.&lt;br /&gt;
Any way to improve experience?&lt;br /&gt;
&lt;br /&gt;
* performance&lt;br /&gt;
Going through scheduler and full networking stack twice&lt;br /&gt;
(host+guest) adds a lot of overhead&lt;br /&gt;
Any way to allow bypassing some layers?&lt;br /&gt;
&lt;br /&gt;
* manageability&lt;br /&gt;
Still hard to figure out VM networking,&lt;br /&gt;
VM networking is through libvirt, host networking through NM&lt;br /&gt;
Any way to integrate?&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Write some unit tests for vhost-net/vhost-scsi&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
* Migrate some of the performance regression autotest functionality into Netperf&lt;br /&gt;
  - Get the CPU-utilization of the Host and the other-party, and add them to the report. This is also true for other Host measures, such as vmexits, interrupts, ...&lt;br /&gt;
  - Run Netperf in demo-mode, and measure only the time when all the sessions are active (could be many seconds after the beginning of the tests)&lt;br /&gt;
  - Packaging of Netperf in Fedora / RHEL (exists in Fedora). Licensing could be an issue.&lt;br /&gt;
  - Make the scripts more visible&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=118501</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=118501"/>
		<updated>2014-11-10T10:59:05Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome! ===&lt;br /&gt;
&lt;br /&gt;
* virtio 1.0 support for linux guests&lt;br /&gt;
    required for maintainatibility&lt;br /&gt;
    mid.gmane.org/1414081380-14623-1-git-send-email-mst@redhat.com&lt;br /&gt;
    Developer: MST,Cornelia Huck&lt;br /&gt;
&lt;br /&gt;
* virtio 1.0 support in qemu&lt;br /&gt;
    required for maintainatibility&lt;br /&gt;
    mid.gmane.org/20141024103839.7162b93f.cornelia.huck@de.ibm.com&lt;br /&gt;
    Developer: Cornelia Huck, MST&lt;br /&gt;
&lt;br /&gt;
* improve net polling for cpu overcommit&lt;br /&gt;
    exit busy loop when another process is runnable&lt;br /&gt;
    mid.gmane.org/1408608310-13579-2-git-send-email-jasowang@redhat.com&lt;br /&gt;
    Developer: Jason Wang, MST&lt;br /&gt;
&lt;br /&gt;
* vhost-net/tun/macvtap cross endian support&lt;br /&gt;
    mid.gmane.org/1414572130-17014-2-git-send-email-clg@fr.ibm.com&lt;br /&gt;
    Developer: Cédric Le Goater, MST&lt;br /&gt;
&lt;br /&gt;
* BQL/aggregation for virtio net&lt;br /&gt;
   dependencies: orphan packets less agressively, enable tx interrupt &lt;br /&gt;
   Developers: MST, Jason&lt;br /&gt;
* orphan packets less agressively (was make pktgen works for virtio-net ( or partially orphan ))&lt;br /&gt;
       virtio-net orphans all skbs during tx, this used to be optimal.&lt;br /&gt;
       Recent changes in guest networking stack and hardware advances&lt;br /&gt;
       such as APICv changed optimal behaviour for drivers.&lt;br /&gt;
       We need to revisit optimizations such as orphaning all packets early&lt;br /&gt;
       to have optimal behaviour.&lt;br /&gt;
&lt;br /&gt;
       this should also fix pktgen which is currently broken with virtio net:&lt;br /&gt;
       orphaning all skbs makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: bring back tx interrupt (partially)&lt;br /&gt;
       Jason&#039;s idea: introduce a flag to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developers: Jason Wang, MST&lt;br /&gt;
&lt;br /&gt;
* enable tx interrupt (conditionally?)&lt;br /&gt;
  Small packet TCP stream performance is not good. This is because virtio-net orphan the packet during ndo_start_xmit() which disable the TCP small packet optimizations like TCP small Queue and AutoCork. The idea is enable the tx interrupt to TCP small packets.&lt;br /&gt;
  Jason&#039;s idea: switch between poll and tx interrupt mode based on recent statistics.&lt;br /&gt;
  MST&#039;s idea: use a per descriptor flag for virtio to force interrupt for a specific packet.&lt;br /&gt;
  Developer: Jason Wang, MST&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-net polling&lt;br /&gt;
      mid.gmane.org/20141029123831.A80F338002D@moren.haifa.ibm.com&lt;br /&gt;
      Developer: Razya Ladelsky&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Razya Ladelsky, Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* support more queues in tun&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     http://mid.gmane.org/1408369040-1216-1-git-send-email-pagupta@redhat.com&lt;br /&gt;
     Developers: Pankaj Gupta, Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Documentation/networking/scaling.txt&lt;br /&gt;
       Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default.&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* ethtool seftest support for virtio-net&lt;br /&gt;
        Implement selftest ethtool method for virtio-net for regression test e.g the CVEs found for tun/macvtap, qemu and vhost.&lt;br /&gt;
        mid.gmane.org/1409881866-14780-1-git-send-email-hjxiaohust@gmail.com&lt;br /&gt;
        Developers: Hengjinxiao,Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Dev watchdog for virtio-net:&lt;br /&gt;
        Implement a watchdog for virtio-net. This will be useful for hunting host bugs early.&lt;br /&gt;
        Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc/allmulti mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Done for unicast, but not for multicast.&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-user: clean up protocol&lt;br /&gt;
  address multiple issues in vhost user protocol:&lt;br /&gt;
   missing VHOST_NET_SET_BACKEND&lt;br /&gt;
   make more messages synchronous (with a reply)&lt;br /&gt;
   VHOST_SET_MEM_TABLE, VHOST_SET_VRING_CALL&lt;br /&gt;
  mid.gmane.org/541956B8.1070203@huawei.com&lt;br /&gt;
  mid.gmane.org/54192136.2010409@huawei.com&lt;br /&gt;
   Developer: MST?&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan?&lt;br /&gt;
&lt;br /&gt;
* Enable LRO with bridging&lt;br /&gt;
  Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman?&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: Marcel Apfelbaum&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - rx busy polling for virtio-net [DONE]&lt;br /&gt;
    see https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=91815639d8804d1eee7ce2e1f7f60b36771db2c9. 1 byte netperf TCP_RR shows 127% improvement.&lt;br /&gt;
    Future work is co-operate with host, and only does the busy polling when there&#039;s no other process in host cpu. &lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
  Reduce the number of interrupt&lt;br /&gt;
  Rx interrupt coalescing should be good for rx stream throughput.&lt;br /&gt;
  Tx interrupt coalescing will help the optimization of enabling tx interrupt conditionally.&lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* sharing config interrupts&lt;br /&gt;
  Support more devices by sharing a single msi vector&lt;br /&gt;
  between multiple virtio devices.&lt;br /&gt;
  (Applies to virtio-blk too).&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Multi-queue macvtap with real multiple queues&lt;br /&gt;
        Macvtap only provides multiple queues to user in the form of multiple&lt;br /&gt;
        sockets.  As each socket will perform dev_queue_xmit() and we don&#039;t&lt;br /&gt;
        really have multiple real queues on the device, we now have a lock&lt;br /&gt;
        contention.  This contention needs to be addressed.&lt;br /&gt;
        Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* better xmit queueing for tun&lt;br /&gt;
        when guest is slower than host, tun drops packets&lt;br /&gt;
        aggressively. This is because keeping packets on&lt;br /&gt;
        the internal queue does not work well.&lt;br /&gt;
        re-enable functionality to stop queue,&lt;br /&gt;
        probably with some watchdog to help with buggy guests.&lt;br /&gt;
        Developer: MST&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== projects in need of an owner ===&lt;br /&gt;
&lt;br /&gt;
* DPDK with vhost-user&lt;br /&gt;
  Support vhost-user in addition to vhost net cuse device&lt;br /&gt;
  Contact: Linhaifeng, MST&lt;br /&gt;
&lt;br /&gt;
* DPDK with vhost-net/user: fix offloads&lt;br /&gt;
  DPDK requires disabling offloads ATM,&lt;br /&gt;
  need to fix this.&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* reduce per-device memory allocations&lt;br /&gt;
  vhost device is very large due to need to&lt;br /&gt;
  keep large arrays of iovecs around.&lt;br /&gt;
  we do need large arrays for correctness,&lt;br /&gt;
  but we could move them out of line,&lt;br /&gt;
  and add short inline arrays for typical use-cases.&lt;br /&gt;
  contact: MST&lt;br /&gt;
&lt;br /&gt;
* batch tx completions in vhost&lt;br /&gt;
  vhost already batches up to 64 tx completions for zero copy&lt;br /&gt;
  batch non zero copy as well&lt;br /&gt;
  contact: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* better parallelize small queues&lt;br /&gt;
  don&#039;t wait for ring full to kick.&lt;br /&gt;
  add api to detect ring almost full (e.g. 3/4) and kick&lt;br /&gt;
  depends on: BQL&lt;br /&gt;
  contact: MST&lt;br /&gt;
&lt;br /&gt;
* improve vhost-user unit test&lt;br /&gt;
  support running on machines without hugetlbfs&lt;br /&gt;
  support running with more vm memory layouts&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* tun: fix RX livelock&lt;br /&gt;
        it&#039;s easy for guest to starve out host networking&lt;br /&gt;
        open way to fix this is to use napi &lt;br /&gt;
        Contact: MST&lt;br /&gt;
&lt;br /&gt;
* large-order allocations&lt;br /&gt;
   see 28d6427109d13b0f447cba5761f88d3548e83605&lt;br /&gt;
   contact: MST&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Contact: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  This project seems abandoned?&lt;br /&gt;
  Contact: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level-triggered interrupts&lt;br /&gt;
  aim: enable vhost by default for level interrupts.&lt;br /&gt;
  The benefit is security: we want to avoid using userspace&lt;br /&gt;
  virtio net so that vhost-net is always used.&lt;br /&gt;
&lt;br /&gt;
  Alex emulated (post &amp;amp; re-enable) level-triggered interrupt in KVM for&lt;br /&gt;
  skipping userspace. VFIO already enjoied the performance benefit,&lt;br /&gt;
  let&#039;s do it for virtio-pci. Current virtio-pci devices still use&lt;br /&gt;
  level-interrupt in userspace.&lt;br /&gt;
  see: kernel:&lt;br /&gt;
  7a84428af [PATCH] KVM: Add resampling irqfds for level triggered interrupts&lt;br /&gt;
 qemu:&lt;br /&gt;
  68919cac [PATCH] hw/vfio: set interrupts using pci irq wrappers&lt;br /&gt;
           (virtio-pci didn&#039;t use the wrappers)&lt;br /&gt;
  e1d1e586 [PATCH] vfio-pci: Add KVM INTx acceleration&lt;br /&gt;
&lt;br /&gt;
  Contact: Amos Kong, MST       &lt;br /&gt;
&lt;br /&gt;
* Head of line blocking issue with zerocopy&lt;br /&gt;
       zerocopy has several defects that will cause head of line blocking problem:&lt;br /&gt;
       - limit the number of pending DMAs&lt;br /&gt;
       - complete in order&lt;br /&gt;
       This means is one of some of the DMAs were delayed, all other will also delayed. This could be reproduced with following case:&lt;br /&gt;
       - boot two VMS VM1(tap1) and VM2(tap2) on host1 (has eth0)&lt;br /&gt;
       - setup tbf to limit the tap2 bandwidth to 10Mbit/s&lt;br /&gt;
       - start two netperf instances one from VM1 to VM2, another from VM1 to an external host whose traffic go through eth0 on host&lt;br /&gt;
       Then you can see not only VM1 to VM2 is throttled, but also VM1 to external host were also throttled.&lt;br /&gt;
       For this issue, a solution is orphan the frags when en queuing to non work conserving qdisc.&lt;br /&gt;
       But we have have similar issues in other case:&lt;br /&gt;
       - The card has its own priority queues&lt;br /&gt;
       - Host has two interface, one is 1G another is 10G, so throttle 1G may lead traffic over 10G to be throttled.&lt;br /&gt;
       The final solution is to remove receive buffering at tun, and convert it to use NAPI&lt;br /&gt;
       Contact: Jason Wang, MST&lt;br /&gt;
       Reference: https://lkml.org/lkml/2014/1/17/105&lt;br /&gt;
&lt;br /&gt;
* network traffic throttling&lt;br /&gt;
  block implemented &amp;quot;continuous leaky bucket&amp;quot; for throttling&lt;br /&gt;
  we can use continuous leaky bucket to network&lt;br /&gt;
  IOPS/BPS * RX/TX/TOTAL&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* Allocate mac_table dynamically&lt;br /&gt;
&lt;br /&gt;
  In the future, maybe we can allocate the mac_table dynamically instead&lt;br /&gt;
  of embed it in VirtIONet. Then we can just does a pointer swap and&lt;br /&gt;
  gfree() and can save a memcpy() here.&lt;br /&gt;
  Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
    Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
        Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* add documentation for macvlan and macvtap&lt;br /&gt;
   recent docs here:&lt;br /&gt;
   http://backreference.org/2014/03/20/some-notes-on-macvlanmacvtap/&lt;br /&gt;
   need to integrate in iproute and kernel docs.&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
  Search for &amp;quot;Xin Xiaohui: Provide a zero-copy method on KVM virtio-net&amp;quot;&lt;br /&gt;
  for a very old prototype&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
* non-virtio device support with vhost&lt;br /&gt;
  Use vhost interface for guests that don&#039;t use virtio-net&lt;br /&gt;
&lt;br /&gt;
* Extend sndbuf scope to int64&lt;br /&gt;
  Current sndbuf limit is INT_MAX in tap_set_sndbuf(),&lt;br /&gt;
  large values (like 8388607T) can be converted rightly by qapi from qemu commandline,&lt;br /&gt;
  If we want to support the large values, we should extend sndbuf limit from &#039;int&#039; to &#039;int64&#039;&lt;br /&gt;
  Why is this useful?&lt;br /&gt;
  Upstream discussion: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg04192.html&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear ===&lt;br /&gt;
&lt;br /&gt;
* change tcp_tso_should_defer for kvm: batch more&lt;br /&gt;
  aggressively.&lt;br /&gt;
  in particular, see below&lt;br /&gt;
&lt;br /&gt;
* tcp: increase gso buffering for cubic,reno&lt;br /&gt;
    At the moment we push out an skb whenever the limit becomes&lt;br /&gt;
    large enough to send a full-sized TSO skb even if the skb,&lt;br /&gt;
    in fact, is not full-sized.&lt;br /&gt;
    The reason for this seems to be that some congestion avoidance&lt;br /&gt;
    protocols rely on the number of packets in flight to calculate&lt;br /&gt;
    CWND, so if we underuse the available CWND it shrinks&lt;br /&gt;
    which degrades performance:&lt;br /&gt;
    http://www.mail-archive.com/netdev@vger.kernel.org/msg08738.html&lt;br /&gt;
&lt;br /&gt;
    However, there seems to be no reason to do this for&lt;br /&gt;
    protocols such as reno and cubic which don&#039;t rely on packets in flight,&lt;br /&gt;
    and so will simply increase CWND a bit more to compensate for the&lt;br /&gt;
    underuse.&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        kernel part is done (Vlad Yasevich)&lt;br /&gt;
        teach qemu to notify libvirt to enable the filter (still to do) (existed NIC_RX_FILTER_CHANGED event contains vlan-tables)&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* virtio: preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
* vxlan&lt;br /&gt;
  What could we do here?&lt;br /&gt;
&lt;br /&gt;
* bridging without promisc mode with OVS&lt;br /&gt;
&lt;br /&gt;
=== high level issues: not clear what the project is, yet ===&lt;br /&gt;
&lt;br /&gt;
* security: iptables&lt;br /&gt;
At the moment most people disables iptables to get&lt;br /&gt;
good performance on 10G/s networking.&lt;br /&gt;
Any way to improve experience?&lt;br /&gt;
&lt;br /&gt;
* performance&lt;br /&gt;
Going through scheduler and full networking stack twice&lt;br /&gt;
(host+guest) adds a lot of overhead&lt;br /&gt;
Any way to allow bypassing some layers?&lt;br /&gt;
&lt;br /&gt;
* manageability&lt;br /&gt;
Still hard to figure out VM networking,&lt;br /&gt;
VM networking is through libvirt, host networking through NM&lt;br /&gt;
Any way to integrate?&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Write some unit tests for vhost-net/vhost-scsi&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
* Migrate some of the performance regression autotest functionality into Netperf&lt;br /&gt;
  - Get the CPU-utilization of the Host and the other-party, and add them to the report. This is also true for other Host measures, such as vmexits, interrupts, ...&lt;br /&gt;
  - Run Netperf in demo-mode, and measure only the time when all the sessions are active (could be many seconds after the beginning of the tests)&lt;br /&gt;
  - Packaging of Netperf in Fedora / RHEL (exists in Fedora). Licensing could be an issue.&lt;br /&gt;
  - Make the scripts more visible&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=118500</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=118500"/>
		<updated>2014-11-10T10:48:55Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome! ===&lt;br /&gt;
&lt;br /&gt;
* virtio 1.0 support for linux guests&lt;br /&gt;
    required for maintainatibility&lt;br /&gt;
    mid.gmane.org/1414081380-14623-1-git-send-email-mst@redhat.com&lt;br /&gt;
    Developer: MST,Cornelia Huck&lt;br /&gt;
&lt;br /&gt;
* virtio 1.0 support in qemu&lt;br /&gt;
    required for maintainatibility&lt;br /&gt;
    mid.gmane.org/20141024103839.7162b93f.cornelia.huck@de.ibm.com&lt;br /&gt;
    Developer: Cornelia Huck, MST&lt;br /&gt;
&lt;br /&gt;
* improve net polling for cpu overcommit&lt;br /&gt;
    exit busy loop when another process is runnable&lt;br /&gt;
    mid.gmane.org/1408608310-13579-2-git-send-email-jasowang@redhat.com&lt;br /&gt;
    Developer: Jason Wang, MST&lt;br /&gt;
&lt;br /&gt;
* vhost-net/tun/macvtap cross endian support&lt;br /&gt;
    mid.gmane.org/1414572130-17014-2-git-send-email-clg@fr.ibm.com&lt;br /&gt;
    Developer: Cédric Le Goater, MST&lt;br /&gt;
&lt;br /&gt;
* BQL/aggregation for virtio net&lt;br /&gt;
   dependencies: orphan packets less agressively, enable tx interrupt &lt;br /&gt;
   Developers: MST, Jason&lt;br /&gt;
* orphan packets less agressively (was make pktgen works for virtio-net ( or partially orphan ))&lt;br /&gt;
       virtio-net orphans all skbs during tx, this used to be optimal.&lt;br /&gt;
       Recent changes in guest networking stack and hardware advances&lt;br /&gt;
       such as APICv changed optimal behaviour for drivers.&lt;br /&gt;
       We need to revisit optimizations such as orphaning all packets early&lt;br /&gt;
       to have optimal behaviour.&lt;br /&gt;
&lt;br /&gt;
       this should also fix pktgen which is currently broken with virtio net:&lt;br /&gt;
       orphaning all skbs makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: bring back tx interrupt (partially)&lt;br /&gt;
       Jason&#039;s idea: introduce a flag to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developers: Jason Wang, MST&lt;br /&gt;
&lt;br /&gt;
* enable tx interrupt (conditionally?)&lt;br /&gt;
  Small packet TCP stream performance is not good. This is because virtio-net orphan the packet during ndo_start_xmit() which disable the TCP small packet optimizations like TCP small Queue and AutoCork. The idea is enable the tx interrupt to TCP small packets.&lt;br /&gt;
  Jason&#039;s idea: switch between poll and tx interrupt mode based on recent statistics.&lt;br /&gt;
  MST&#039;s idea: use a per descriptor flag for virtio to force interrupt for a specific packet.&lt;br /&gt;
  Developer: Jason Wang, MST&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-net polling&lt;br /&gt;
      mid.gmane.org/20141029123831.A80F338002D@moren.haifa.ibm.com&lt;br /&gt;
      Developer: Razya Ladelsky&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Razya Ladelsky, Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* support more queues in tun&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     http://mid.gmane.org/1408369040-1216-1-git-send-email-pagupta@redhat.com&lt;br /&gt;
     Developers: Pankaj Gupta, Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Documentation/networking/scaling.txt&lt;br /&gt;
       Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default.&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* ethtool seftest support for virtio-net&lt;br /&gt;
        Implement selftest ethtool method for virtio-net for regression test e.g the CVEs found for tun/macvtap, qemu and vhost.&lt;br /&gt;
        mid.gmane.org/1409881866-14780-1-git-send-email-hjxiaohust@gmail.com&lt;br /&gt;
        Developers: Hengjinxiao,Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Dev watchdog for virtio-net:&lt;br /&gt;
        Implement a watchdog for virtio-net. This will be useful for hunting host bugs early.&lt;br /&gt;
        Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc/allmulti mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Done for unicast, but not for multicast.&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-user: clean up protocol&lt;br /&gt;
  address multiple issues in vhost user protocol:&lt;br /&gt;
   missing VHOST_NET_SET_BACKEND&lt;br /&gt;
   make more messages synchronous (with a reply)&lt;br /&gt;
   VHOST_SET_MEM_TABLE, VHOST_SET_VRING_CALL&lt;br /&gt;
  mid.gmane.org/541956B8.1070203@huawei.com&lt;br /&gt;
  mid.gmane.org/54192136.2010409@huawei.com&lt;br /&gt;
   Developer: MST?&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan?&lt;br /&gt;
&lt;br /&gt;
* Enable LRO with bridging&lt;br /&gt;
  Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman?&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: Marcel Apfelbaum&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - rx busy polling for virtio-net [DONE]&lt;br /&gt;
    see https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=91815639d8804d1eee7ce2e1f7f60b36771db2c9. 1 byte netperf TCP_RR shows 127% improvement.&lt;br /&gt;
    Future work is co-operate with host, and only does the busy polling when there&#039;s no other process in host cpu. &lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
  Reduce the number of interrupt&lt;br /&gt;
  Rx interrupt coalescing should be good for rx stream throughput.&lt;br /&gt;
  Tx interrupt coalescing will help the optimization of enabling tx interrupt conditionally.&lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* sharing config interrupts&lt;br /&gt;
  Support more devices by sharing a single msi vector&lt;br /&gt;
  between multiple virtio devices.&lt;br /&gt;
  (Applies to virtio-blk too).&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Multi-queue macvtap with real multiple queues&lt;br /&gt;
        Macvtap only provides multiple queues to user in the form of multiple&lt;br /&gt;
        sockets.  As each socket will perform dev_queue_xmit() and we don&#039;t&lt;br /&gt;
        really have multiple real queues on the device, we now have a lock&lt;br /&gt;
        contention.  This contention needs to be addressed.&lt;br /&gt;
        Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* better xmit queueing for tun&lt;br /&gt;
        when guest is slower than host, tun drops packets&lt;br /&gt;
        aggressively. This is because keeping packets on&lt;br /&gt;
        the internal queue does not work well.&lt;br /&gt;
        re-enable functionality to stop queue,&lt;br /&gt;
        probably with some watchdog to help with buggy guests.&lt;br /&gt;
        Developer: MST&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== projects in need of an owner ===&lt;br /&gt;
&lt;br /&gt;
* reduce per-device memory allocations&lt;br /&gt;
  vhost device is very large due to need to&lt;br /&gt;
  keep large arrays of iovecs around.&lt;br /&gt;
  we do need large arrays for correctness,&lt;br /&gt;
  but we could move them out of line,&lt;br /&gt;
  and add short inline arrays for typical use-cases.&lt;br /&gt;
  contact: MST&lt;br /&gt;
&lt;br /&gt;
* batch tx completions in vhost&lt;br /&gt;
  vhost already batches up to 64 tx completions for zero copy&lt;br /&gt;
  batch non zero copy as well&lt;br /&gt;
  contact: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* better parallelize small queues&lt;br /&gt;
  don&#039;t wait for ring full to kick.&lt;br /&gt;
  add api to detect ring almost full (e.g. 3/4) and kick&lt;br /&gt;
  depends on: BQL&lt;br /&gt;
  contact: MST&lt;br /&gt;
&lt;br /&gt;
* improve vhost-user unit test&lt;br /&gt;
  support running on machines without hugetlbfs&lt;br /&gt;
  support running with more vm memory layouts&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* tun: fix RX livelock&lt;br /&gt;
        it&#039;s easy for guest to starve out host networking&lt;br /&gt;
        open way to fix this is to use napi &lt;br /&gt;
        Contact: MST&lt;br /&gt;
&lt;br /&gt;
* large-order allocations&lt;br /&gt;
   see 28d6427109d13b0f447cba5761f88d3548e83605&lt;br /&gt;
   contact: MST&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Contact: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  This project seems abandoned?&lt;br /&gt;
  Contact: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level-triggered interrupts&lt;br /&gt;
  aim: enable vhost by default for level interrupts.&lt;br /&gt;
  The benefit is security: we want to avoid using userspace&lt;br /&gt;
  virtio net so that vhost-net is always used.&lt;br /&gt;
&lt;br /&gt;
  Alex emulated (post &amp;amp; re-enable) level-triggered interrupt in KVM for&lt;br /&gt;
  skipping userspace. VFIO already enjoied the performance benefit,&lt;br /&gt;
  let&#039;s do it for virtio-pci. Current virtio-pci devices still use&lt;br /&gt;
  level-interrupt in userspace.&lt;br /&gt;
  see: kernel:&lt;br /&gt;
  7a84428af [PATCH] KVM: Add resampling irqfds for level triggered interrupts&lt;br /&gt;
 qemu:&lt;br /&gt;
  68919cac [PATCH] hw/vfio: set interrupts using pci irq wrappers&lt;br /&gt;
           (virtio-pci didn&#039;t use the wrappers)&lt;br /&gt;
  e1d1e586 [PATCH] vfio-pci: Add KVM INTx acceleration&lt;br /&gt;
&lt;br /&gt;
  Contact: Amos Kong, MST       &lt;br /&gt;
&lt;br /&gt;
* Head of line blocking issue with zerocopy&lt;br /&gt;
       zerocopy has several defects that will cause head of line blocking problem:&lt;br /&gt;
       - limit the number of pending DMAs&lt;br /&gt;
       - complete in order&lt;br /&gt;
       This means is one of some of the DMAs were delayed, all other will also delayed. This could be reproduced with following case:&lt;br /&gt;
       - boot two VMS VM1(tap1) and VM2(tap2) on host1 (has eth0)&lt;br /&gt;
       - setup tbf to limit the tap2 bandwidth to 10Mbit/s&lt;br /&gt;
       - start two netperf instances one from VM1 to VM2, another from VM1 to an external host whose traffic go through eth0 on host&lt;br /&gt;
       Then you can see not only VM1 to VM2 is throttled, but also VM1 to external host were also throttled.&lt;br /&gt;
       For this issue, a solution is orphan the frags when en queuing to non work conserving qdisc.&lt;br /&gt;
       But we have have similar issues in other case:&lt;br /&gt;
       - The card has its own priority queues&lt;br /&gt;
       - Host has two interface, one is 1G another is 10G, so throttle 1G may lead traffic over 10G to be throttled.&lt;br /&gt;
       The final solution is to remove receive buffering at tun, and convert it to use NAPI&lt;br /&gt;
       Contact: Jason Wang, MST&lt;br /&gt;
       Reference: https://lkml.org/lkml/2014/1/17/105&lt;br /&gt;
&lt;br /&gt;
* network traffic throttling&lt;br /&gt;
  block implemented &amp;quot;continuous leaky bucket&amp;quot; for throttling&lt;br /&gt;
  we can use continuous leaky bucket to network&lt;br /&gt;
  IOPS/BPS * RX/TX/TOTAL&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* Allocate mac_table dynamically&lt;br /&gt;
&lt;br /&gt;
  In the future, maybe we can allocate the mac_table dynamically instead&lt;br /&gt;
  of embed it in VirtIONet. Then we can just does a pointer swap and&lt;br /&gt;
  gfree() and can save a memcpy() here.&lt;br /&gt;
  Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
    Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
        Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* add documentation for macvlan and macvtap&lt;br /&gt;
   recent docs here:&lt;br /&gt;
   http://backreference.org/2014/03/20/some-notes-on-macvlanmacvtap/&lt;br /&gt;
   need to integrate in iproute and kernel docs.&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
  Search for &amp;quot;Xin Xiaohui: Provide a zero-copy method on KVM virtio-net&amp;quot;&lt;br /&gt;
  for a very old prototype&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
* non-virtio device support with vhost&lt;br /&gt;
  Use vhost interface for guests that don&#039;t use virtio-net&lt;br /&gt;
&lt;br /&gt;
* Extend sndbuf scope to int64&lt;br /&gt;
  Current sndbuf limit is INT_MAX in tap_set_sndbuf(),&lt;br /&gt;
  large values (like 8388607T) can be converted rightly by qapi from qemu commandline,&lt;br /&gt;
  If we want to support the large values, we should extend sndbuf limit from &#039;int&#039; to &#039;int64&#039;&lt;br /&gt;
  Why is this useful?&lt;br /&gt;
  Upstream discussion: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg04192.html&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear ===&lt;br /&gt;
&lt;br /&gt;
* change tcp_tso_should_defer for kvm: batch more&lt;br /&gt;
  aggressively.&lt;br /&gt;
  in particular, see below&lt;br /&gt;
&lt;br /&gt;
* tcp: increase gso buffering for cubic,reno&lt;br /&gt;
    At the moment we push out an skb whenever the limit becomes&lt;br /&gt;
    large enough to send a full-sized TSO skb even if the skb,&lt;br /&gt;
    in fact, is not full-sized.&lt;br /&gt;
    The reason for this seems to be that some congestion avoidance&lt;br /&gt;
    protocols rely on the number of packets in flight to calculate&lt;br /&gt;
    CWND, so if we underuse the available CWND it shrinks&lt;br /&gt;
    which degrades performance:&lt;br /&gt;
    http://www.mail-archive.com/netdev@vger.kernel.org/msg08738.html&lt;br /&gt;
&lt;br /&gt;
    However, there seems to be no reason to do this for&lt;br /&gt;
    protocols such as reno and cubic which don&#039;t rely on packets in flight,&lt;br /&gt;
    and so will simply increase CWND a bit more to compensate for the&lt;br /&gt;
    underuse.&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        kernel part is done (Vlad Yasevich)&lt;br /&gt;
        teach qemu to notify libvirt to enable the filter (still to do) (existed NIC_RX_FILTER_CHANGED event contains vlan-tables)&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* virtio: preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
* vxlan&lt;br /&gt;
  What could we do here?&lt;br /&gt;
&lt;br /&gt;
* bridging without promisc mode with OVS&lt;br /&gt;
&lt;br /&gt;
=== high level issues: not clear what the project is, yet ===&lt;br /&gt;
&lt;br /&gt;
* security: iptables&lt;br /&gt;
At the moment most people disables iptables to get&lt;br /&gt;
good performance on 10G/s networking.&lt;br /&gt;
Any way to improve experience?&lt;br /&gt;
&lt;br /&gt;
* performance&lt;br /&gt;
Going through scheduler and full networking stack twice&lt;br /&gt;
(host+guest) adds a lot of overhead&lt;br /&gt;
Any way to allow bypassing some layers?&lt;br /&gt;
&lt;br /&gt;
* manageability&lt;br /&gt;
Still hard to figure out VM networking,&lt;br /&gt;
VM networking is through libvirt, host networking through NM&lt;br /&gt;
Any way to integrate?&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Write some unit tests for vhost-net/vhost-scsi&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
* Migrate some of the performance regression autotest functionality into Netperf&lt;br /&gt;
  - Get the CPU-utilization of the Host and the other-party, and add them to the report. This is also true for other Host measures, such as vmexits, interrupts, ...&lt;br /&gt;
  - Run Netperf in demo-mode, and measure only the time when all the sessions are active (could be many seconds after the beginning of the tests)&lt;br /&gt;
  - Packaging of Netperf in Fedora / RHEL (exists in Fedora). Licensing could be an issue.&lt;br /&gt;
  - Make the scripts more visible&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=118499</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=118499"/>
		<updated>2014-11-10T10:43:21Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome! ===&lt;br /&gt;
&lt;br /&gt;
* virtio 1.0 support for linux guests&lt;br /&gt;
    required for maintainatibility&lt;br /&gt;
    mid.gmane.org/1414081380-14623-1-git-send-email-mst@redhat.com&lt;br /&gt;
    Developer: MST,Cornelia Huck&lt;br /&gt;
&lt;br /&gt;
* virtio 1.0 support in qemu&lt;br /&gt;
    required for maintainatibility&lt;br /&gt;
    mid.gmane.org/20141024103839.7162b93f.cornelia.huck@de.ibm.com&lt;br /&gt;
    Developer: Cornelia Huck, MST&lt;br /&gt;
&lt;br /&gt;
* improve net polling for cpu overcommit&lt;br /&gt;
    exit busy loop when another process is runnable&lt;br /&gt;
    mid.gmane.org/1408608310-13579-2-git-send-email-jasowang@redhat.com&lt;br /&gt;
    Developer: Jason Wang, MST&lt;br /&gt;
&lt;br /&gt;
* vhost-net/tun/macvtap cross endian support&lt;br /&gt;
    mid.gmane.org/1414572130-17014-2-git-send-email-clg@fr.ibm.com&lt;br /&gt;
    Developer: Cédric Le Goater, MST&lt;br /&gt;
&lt;br /&gt;
* BQL/aggregation for virtio net&lt;br /&gt;
   dependencies: orphan packets less agressively, enable tx interrupt &lt;br /&gt;
   Developers: MST, Jason&lt;br /&gt;
* orphan packets less agressively (was make pktgen works for virtio-net ( or partially orphan ))&lt;br /&gt;
       virtio-net orphans all skbs during tx, this used to be optimal.&lt;br /&gt;
       Recent changes in guest networking stack and hardware advances&lt;br /&gt;
       such as APICv changed optimal behaviour for drivers.&lt;br /&gt;
       We need to revisit optimizations such as orphaning all packets early&lt;br /&gt;
       to have optimal behaviour.&lt;br /&gt;
&lt;br /&gt;
       this should also fix pktgen which is currently broken with virtio net:&lt;br /&gt;
       orphaning all skbs makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: bring back tx interrupt (partially)&lt;br /&gt;
       Jason&#039;s idea: introduce a flag to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developers: Jason Wang, MST&lt;br /&gt;
&lt;br /&gt;
* enable tx interrupt (conditionally?)&lt;br /&gt;
  Small packet TCP stream performance is not good. This is because virtio-net orphan the packet during ndo_start_xmit() which disable the TCP small packet optimizations like TCP small Queue and AutoCork. The idea is enable the tx interrupt to TCP small packets.&lt;br /&gt;
  Jason&#039;s idea: switch between poll and tx interrupt mode based on recent statistics.&lt;br /&gt;
  MST&#039;s idea: use a per descriptor flag for virtio to force interrupt for a specific packet.&lt;br /&gt;
  Developer: Jason Wang, MST&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-net polling&lt;br /&gt;
      mid.gmane.org/20141029123831.A80F338002D@moren.haifa.ibm.com&lt;br /&gt;
      Developer: Razya Ladelsky&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Razya Ladelsky, Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* support more queues in tun&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     http://mid.gmane.org/1408369040-1216-1-git-send-email-pagupta@redhat.com&lt;br /&gt;
     Developers: Pankaj Gupta, Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Documentation/networking/scaling.txt&lt;br /&gt;
       Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default.&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* ethtool seftest support for virtio-net&lt;br /&gt;
        Implement selftest ethtool method for virtio-net for regression test e.g the CVEs found for tun/macvtap, qemu and vhost.&lt;br /&gt;
        mid.gmane.org/1409881866-14780-1-git-send-email-hjxiaohust@gmail.com&lt;br /&gt;
        Developers: Hengjinxiao,Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Dev watchdog for virtio-net:&lt;br /&gt;
        Implement a watchdog for virtio-net. This will be useful for hunting host bugs early.&lt;br /&gt;
        Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc/allmulti mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Done for unicast, but not for multicast.&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-user: clean up protocol&lt;br /&gt;
  address multiple issues in vhost user protocol:&lt;br /&gt;
   missing VHOST_NET_SET_BACKEND&lt;br /&gt;
   make more messages synchronous (with a reply)&lt;br /&gt;
   VHOST_SET_MEM_TABLE, VHOST_SET_VRING_CALL&lt;br /&gt;
  mid.gmane.org/541956B8.1070203@huawei.com&lt;br /&gt;
  mid.gmane.org/54192136.2010409@huawei.com&lt;br /&gt;
   Developer: MST?&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan?&lt;br /&gt;
&lt;br /&gt;
* Enable LRO with bridging&lt;br /&gt;
  Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman?&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: Marcel Apfelbaum&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - rx busy polling for virtio-net [DONE]&lt;br /&gt;
    see https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=91815639d8804d1eee7ce2e1f7f60b36771db2c9. 1 byte netperf TCP_RR shows 127% improvement.&lt;br /&gt;
    Future work is co-operate with host, and only does the busy polling when there&#039;s no other process in host cpu. &lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
  Reduce the number of interrupt&lt;br /&gt;
  Rx interrupt coalescing should be good for rx stream throughput.&lt;br /&gt;
  Tx interrupt coalescing will help the optimization of enabling tx interrupt conditionally.&lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* sharing config interrupts&lt;br /&gt;
  Support more devices by sharing a single msi vector&lt;br /&gt;
  between multiple virtio devices.&lt;br /&gt;
  (Applies to virtio-blk too).&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Multi-queue macvtap with real multiple queues&lt;br /&gt;
        Macvtap only provides multiple queues to user in the form of multiple&lt;br /&gt;
        sockets.  As each socket will perform dev_queue_xmit() and we don&#039;t&lt;br /&gt;
        really have multiple real queues on the device, we now have a lock&lt;br /&gt;
        contention.  This contention needs to be addressed.&lt;br /&gt;
        Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* better xmit queueing for tun&lt;br /&gt;
        when guest is slower than host, tun drops packets&lt;br /&gt;
        aggressively. This is because keeping packets on&lt;br /&gt;
        the internal queue does not work well.&lt;br /&gt;
        re-enable functionality to stop queue,&lt;br /&gt;
        probably with some watchdog to help with buggy guests.&lt;br /&gt;
        Developer: MST&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== projects in need of an owner ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* improve vhost-user unit test&lt;br /&gt;
  support running on machines without hugetlbfs&lt;br /&gt;
  support running with more vm memory layouts&lt;br /&gt;
  Developer: MST?&lt;br /&gt;
&lt;br /&gt;
* tun: fix RX livelock&lt;br /&gt;
        it&#039;s easy for guest to starve out host networking&lt;br /&gt;
        open way to fix this is to use napi &lt;br /&gt;
        Contact: MST&lt;br /&gt;
&lt;br /&gt;
* large-order allocations&lt;br /&gt;
   see 28d6427109d13b0f447cba5761f88d3548e83605&lt;br /&gt;
   contact: MST&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Contact: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  This project seems abandoned?&lt;br /&gt;
  Contact: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level-triggered interrupts&lt;br /&gt;
  aim: enable vhost by default for level interrupts.&lt;br /&gt;
  The benefit is security: we want to avoid using userspace&lt;br /&gt;
  virtio net so that vhost-net is always used.&lt;br /&gt;
&lt;br /&gt;
  Alex emulated (post &amp;amp; re-enable) level-triggered interrupt in KVM for&lt;br /&gt;
  skipping userspace. VFIO already enjoied the performance benefit,&lt;br /&gt;
  let&#039;s do it for virtio-pci. Current virtio-pci devices still use&lt;br /&gt;
  level-interrupt in userspace.&lt;br /&gt;
  see: kernel:&lt;br /&gt;
  7a84428af [PATCH] KVM: Add resampling irqfds for level triggered interrupts&lt;br /&gt;
 qemu:&lt;br /&gt;
  68919cac [PATCH] hw/vfio: set interrupts using pci irq wrappers&lt;br /&gt;
           (virtio-pci didn&#039;t use the wrappers)&lt;br /&gt;
  e1d1e586 [PATCH] vfio-pci: Add KVM INTx acceleration&lt;br /&gt;
&lt;br /&gt;
  Contact: Amos Kong, MST       &lt;br /&gt;
&lt;br /&gt;
* Head of line blocking issue with zerocopy&lt;br /&gt;
       zerocopy has several defects that will cause head of line blocking problem:&lt;br /&gt;
       - limit the number of pending DMAs&lt;br /&gt;
       - complete in order&lt;br /&gt;
       This means is one of some of the DMAs were delayed, all other will also delayed. This could be reproduced with following case:&lt;br /&gt;
       - boot two VMS VM1(tap1) and VM2(tap2) on host1 (has eth0)&lt;br /&gt;
       - setup tbf to limit the tap2 bandwidth to 10Mbit/s&lt;br /&gt;
       - start two netperf instances one from VM1 to VM2, another from VM1 to an external host whose traffic go through eth0 on host&lt;br /&gt;
       Then you can see not only VM1 to VM2 is throttled, but also VM1 to external host were also throttled.&lt;br /&gt;
       For this issue, a solution is orphan the frags when en queuing to non work conserving qdisc.&lt;br /&gt;
       But we have have similar issues in other case:&lt;br /&gt;
       - The card has its own priority queues&lt;br /&gt;
       - Host has two interface, one is 1G another is 10G, so throttle 1G may lead traffic over 10G to be throttled.&lt;br /&gt;
       The final solution is to remove receive buffering at tun, and convert it to use NAPI&lt;br /&gt;
       Contact: Jason Wang, MST&lt;br /&gt;
       Reference: https://lkml.org/lkml/2014/1/17/105&lt;br /&gt;
&lt;br /&gt;
* network traffic throttling&lt;br /&gt;
  block implemented &amp;quot;continuous leaky bucket&amp;quot; for throttling&lt;br /&gt;
  we can use continuous leaky bucket to network&lt;br /&gt;
  IOPS/BPS * RX/TX/TOTAL&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* Allocate mac_table dynamically&lt;br /&gt;
&lt;br /&gt;
  In the future, maybe we can allocate the mac_table dynamically instead&lt;br /&gt;
  of embed it in VirtIONet. Then we can just does a pointer swap and&lt;br /&gt;
  gfree() and can save a memcpy() here.&lt;br /&gt;
  Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
    Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
        Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* add documentation for macvlan and macvtap&lt;br /&gt;
   recent docs here:&lt;br /&gt;
   http://backreference.org/2014/03/20/some-notes-on-macvlanmacvtap/&lt;br /&gt;
   need to integrate in iproute and kernel docs.&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
  Search for &amp;quot;Xin Xiaohui: Provide a zero-copy method on KVM virtio-net&amp;quot;&lt;br /&gt;
  for a very old prototype&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
* non-virtio device support with vhost&lt;br /&gt;
  Use vhost interface for guests that don&#039;t use virtio-net&lt;br /&gt;
&lt;br /&gt;
* Extend sndbuf scope to int64&lt;br /&gt;
  Current sndbuf limit is INT_MAX in tap_set_sndbuf(),&lt;br /&gt;
  large values (like 8388607T) can be converted rightly by qapi from qemu commandline,&lt;br /&gt;
  If we want to support the large values, we should extend sndbuf limit from &#039;int&#039; to &#039;int64&#039;&lt;br /&gt;
  Why is this useful?&lt;br /&gt;
  Upstream discussion: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg04192.html&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear ===&lt;br /&gt;
&lt;br /&gt;
* change tcp_tso_should_defer for kvm: batch more&lt;br /&gt;
  aggressively.&lt;br /&gt;
  in particular, see below&lt;br /&gt;
&lt;br /&gt;
* tcp: increase gso buffering for cubic,reno&lt;br /&gt;
    At the moment we push out an skb whenever the limit becomes&lt;br /&gt;
    large enough to send a full-sized TSO skb even if the skb,&lt;br /&gt;
    in fact, is not full-sized.&lt;br /&gt;
    The reason for this seems to be that some congestion avoidance&lt;br /&gt;
    protocols rely on the number of packets in flight to calculate&lt;br /&gt;
    CWND, so if we underuse the available CWND it shrinks&lt;br /&gt;
    which degrades performance:&lt;br /&gt;
    http://www.mail-archive.com/netdev@vger.kernel.org/msg08738.html&lt;br /&gt;
&lt;br /&gt;
    However, there seems to be no reason to do this for&lt;br /&gt;
    protocols such as reno and cubic which don&#039;t rely on packets in flight,&lt;br /&gt;
    and so will simply increase CWND a bit more to compensate for the&lt;br /&gt;
    underuse.&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        kernel part is done (Vlad Yasevich)&lt;br /&gt;
        teach qemu to notify libvirt to enable the filter (still to do) (existed NIC_RX_FILTER_CHANGED event contains vlan-tables)&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* virtio: preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
* vxlan&lt;br /&gt;
  What could we do here?&lt;br /&gt;
&lt;br /&gt;
* bridging without promisc mode with OVS&lt;br /&gt;
&lt;br /&gt;
=== high level issues: not clear what the project is, yet ===&lt;br /&gt;
&lt;br /&gt;
* security: iptables&lt;br /&gt;
At the moment most people disables iptables to get&lt;br /&gt;
good performance on 10G/s networking.&lt;br /&gt;
Any way to improve experience?&lt;br /&gt;
&lt;br /&gt;
* performance&lt;br /&gt;
Going through scheduler and full networking stack twice&lt;br /&gt;
(host+guest) adds a lot of overhead&lt;br /&gt;
Any way to allow bypassing some layers?&lt;br /&gt;
&lt;br /&gt;
* manageability&lt;br /&gt;
Still hard to figure out VM networking,&lt;br /&gt;
VM networking is through libvirt, host networking through NM&lt;br /&gt;
Any way to integrate?&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Write some unit tests for vhost-net/vhost-scsi&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
* Migrate some of the performance regression autotest functionality into Netperf&lt;br /&gt;
  - Get the CPU-utilization of the Host and the other-party, and add them to the report. This is also true for other Host measures, such as vmexits, interrupts, ...&lt;br /&gt;
  - Run Netperf in demo-mode, and measure only the time when all the sessions are active (could be many seconds after the beginning of the tests)&lt;br /&gt;
  - Packaging of Netperf in Fedora / RHEL (exists in Fedora). Licensing could be an issue.&lt;br /&gt;
  - Make the scripts more visible&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=118498</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=118498"/>
		<updated>2014-11-10T10:37:04Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome! ===&lt;br /&gt;
&lt;br /&gt;
* virtio 1.0 support for linux guests&lt;br /&gt;
    required for maintainatibility&lt;br /&gt;
    mid.gmane.org/1414081380-14623-1-git-send-email-mst@redhat.com&lt;br /&gt;
    Developer: MST,Cornelia Huck&lt;br /&gt;
* virtio 1.0 support in qemu&lt;br /&gt;
    required for maintainatibility&lt;br /&gt;
    mid.gmane.org/20141024103839.7162b93f.cornelia.huck@de.ibm.com&lt;br /&gt;
    Developer: Cornelia Huck, MST&lt;br /&gt;
&lt;br /&gt;
* vhost-net/tun/macvtap cross endian support&lt;br /&gt;
    mid.gmane.org/1414572130-17014-2-git-send-email-clg@fr.ibm.com&lt;br /&gt;
    Developer: Cédric Le Goater, MST&lt;br /&gt;
&lt;br /&gt;
* BQL/aggregation for virtio net&lt;br /&gt;
   dependencies: orphan packets less agressively, enable tx interrupt &lt;br /&gt;
   Developers: MST, Jason&lt;br /&gt;
* orphan packets less agressively (was make pktgen works for virtio-net ( or partially orphan ))&lt;br /&gt;
       virtio-net orphans all skbs during tx, this used to be optimal.&lt;br /&gt;
       Recent changes in guest networking stack and hardware advances&lt;br /&gt;
       such as APICv changed optimal behaviour for drivers.&lt;br /&gt;
       We need to revisit optimizations such as orphaning all packets early&lt;br /&gt;
       to have optimal behaviour.&lt;br /&gt;
&lt;br /&gt;
       this should also fix pktgen which is currently broken with virtio net:&lt;br /&gt;
       orphaning all skbs makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: bring back tx interrupt (partially)&lt;br /&gt;
       Jason&#039;s idea: introduce a flag to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developers: Jason Wang, MST&lt;br /&gt;
&lt;br /&gt;
* enable tx interrupt (conditionally?)&lt;br /&gt;
  Small packet TCP stream performance is not good. This is because virtio-net orphan the packet during ndo_start_xmit() which disable the TCP small packet optimizations like TCP small Queue and AutoCork. The idea is enable the tx interrupt to TCP small packets.&lt;br /&gt;
  Jason&#039;s idea: switch between poll and tx interrupt mode based on recent statistics.&lt;br /&gt;
  MST&#039;s idea: use a per descriptor flag for virtio to force interrupt for a specific packet.&lt;br /&gt;
  Developer: Jason Wang, MST&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-net polling&lt;br /&gt;
      mid.gmane.org/20141029123831.A80F338002D@moren.haifa.ibm.com&lt;br /&gt;
      Developer: Razya Ladelsky&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Razya Ladelsky, Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* support more queues in tun&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     http://mid.gmane.org/1408369040-1216-1-git-send-email-pagupta@redhat.com&lt;br /&gt;
     Developers: Pankaj Gupta, Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Documentation/networking/scaling.txt&lt;br /&gt;
       Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default.&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* ethtool seftest support for virtio-net&lt;br /&gt;
        Implement selftest ethtool method for virtio-net for regression test e.g the CVEs found for tun/macvtap, qemu and vhost.&lt;br /&gt;
        mid.gmane.org/1409881866-14780-1-git-send-email-hjxiaohust@gmail.com&lt;br /&gt;
        Developers: Hengjinxiao,Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Dev watchdog for virtio-net:&lt;br /&gt;
        Implement a watchdog for virtio-net. This will be useful for hunting host bugs early.&lt;br /&gt;
        Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Amos Kong&lt;br /&gt;
        qemu: https://bugzilla.redhat.com/show_bug.cgi?id=848203 (applied by upstream)&lt;br /&gt;
        libvirt: https://bugzilla.redhat.com/show_bug.cgi?id=848199&lt;br /&gt;
        http://git.qemu.org/?p=qemu.git;a=commit;h=b1be42803b31a913bab65bab563a8760ad2e7f7f&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc/allmulti mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Done for unicast, but not for multicast.&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-user: clean up protocol&lt;br /&gt;
  address multiple issues in vhost user protocol:&lt;br /&gt;
   missing VHOST_NET_SET_BACKEND&lt;br /&gt;
   make more messages synchronous (with a reply)&lt;br /&gt;
   VHOST_SET_MEM_TABLE, VHOST_SET_VRING_CALL&lt;br /&gt;
  mid.gmane.org/541956B8.1070203@huawei.com&lt;br /&gt;
  mid.gmane.org/54192136.2010409@huawei.com&lt;br /&gt;
   Developer: MST?&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan?&lt;br /&gt;
&lt;br /&gt;
* Enable LRO with bridging&lt;br /&gt;
  Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman?&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: Marcel Apfelbaum&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - rx busy polling for virtio-net [DONE]&lt;br /&gt;
    see https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=91815639d8804d1eee7ce2e1f7f60b36771db2c9. 1 byte netperf TCP_RR shows 127% improvement.&lt;br /&gt;
    Future work is co-operate with host, and only does the busy polling when there&#039;s no other process in host cpu. &lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
  Reduce the number of interrupt&lt;br /&gt;
  Rx interrupt coalescing should be good for rx stream throughput.&lt;br /&gt;
  Tx interrupt coalescing will help the optimization of enabling tx interrupt conditionally.&lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* sharing config interrupts&lt;br /&gt;
  Support more devices by sharing a single msi vector&lt;br /&gt;
  between multiple virtio devices.&lt;br /&gt;
  (Applies to virtio-blk too).&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Multi-queue macvtap with real multiple queues&lt;br /&gt;
        Macvtap only provides multiple queues to user in the form of multiple&lt;br /&gt;
        sockets.  As each socket will perform dev_queue_xmit() and we don&#039;t&lt;br /&gt;
        really have multiple real queues on the device, we now have a lock&lt;br /&gt;
        contention.  This contention needs to be addressed.&lt;br /&gt;
        Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* better xmit queueing for tun&lt;br /&gt;
        when guest is slower than host, tun drops packets&lt;br /&gt;
        aggressively. This is because keeping packets on&lt;br /&gt;
        the internal queue does not work well.&lt;br /&gt;
        re-enable functionality to stop queue,&lt;br /&gt;
        probably with some watchdog to help with buggy guests.&lt;br /&gt;
        Developer: MST&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== projects in need of an owner ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* improve vhost-user unit test&lt;br /&gt;
  support running on machines without hugetlbfs&lt;br /&gt;
  support running with more vm memory layouts&lt;br /&gt;
  Developer: MST?&lt;br /&gt;
&lt;br /&gt;
* tun: fix RX livelock&lt;br /&gt;
        it&#039;s easy for guest to starve out host networking&lt;br /&gt;
        open way to fix this is to use napi &lt;br /&gt;
        Contact: MST&lt;br /&gt;
&lt;br /&gt;
* large-order allocations&lt;br /&gt;
   see 28d6427109d13b0f447cba5761f88d3548e83605&lt;br /&gt;
   contact: MST&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Contact: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  This project seems abandoned?&lt;br /&gt;
  Contact: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level-triggered interrupts&lt;br /&gt;
  aim: enable vhost by default for level interrupts.&lt;br /&gt;
  The benefit is security: we want to avoid using userspace&lt;br /&gt;
  virtio net so that vhost-net is always used.&lt;br /&gt;
&lt;br /&gt;
  Alex emulated (post &amp;amp; re-enable) level-triggered interrupt in KVM for&lt;br /&gt;
  skipping userspace. VFIO already enjoied the performance benefit,&lt;br /&gt;
  let&#039;s do it for virtio-pci. Current virtio-pci devices still use&lt;br /&gt;
  level-interrupt in userspace.&lt;br /&gt;
  see: kernel:&lt;br /&gt;
  7a84428af [PATCH] KVM: Add resampling irqfds for level triggered interrupts&lt;br /&gt;
 qemu:&lt;br /&gt;
  68919cac [PATCH] hw/vfio: set interrupts using pci irq wrappers&lt;br /&gt;
           (virtio-pci didn&#039;t use the wrappers)&lt;br /&gt;
  e1d1e586 [PATCH] vfio-pci: Add KVM INTx acceleration&lt;br /&gt;
&lt;br /&gt;
  Contact: Amos Kong, MST       &lt;br /&gt;
&lt;br /&gt;
* Head of line blocking issue with zerocopy&lt;br /&gt;
       zerocopy has several defects that will cause head of line blocking problem:&lt;br /&gt;
       - limit the number of pending DMAs&lt;br /&gt;
       - complete in order&lt;br /&gt;
       This means is one of some of the DMAs were delayed, all other will also delayed. This could be reproduced with following case:&lt;br /&gt;
       - boot two VMS VM1(tap1) and VM2(tap2) on host1 (has eth0)&lt;br /&gt;
       - setup tbf to limit the tap2 bandwidth to 10Mbit/s&lt;br /&gt;
       - start two netperf instances one from VM1 to VM2, another from VM1 to an external host whose traffic go through eth0 on host&lt;br /&gt;
       Then you can see not only VM1 to VM2 is throttled, but also VM1 to external host were also throttled.&lt;br /&gt;
       For this issue, a solution is orphan the frags when en queuing to non work conserving qdisc.&lt;br /&gt;
       But we have have similar issues in other case:&lt;br /&gt;
       - The card has its own priority queues&lt;br /&gt;
       - Host has two interface, one is 1G another is 10G, so throttle 1G may lead traffic over 10G to be throttled.&lt;br /&gt;
       The final solution is to remove receive buffering at tun, and convert it to use NAPI&lt;br /&gt;
       Contact: Jason Wang, MST&lt;br /&gt;
       Reference: https://lkml.org/lkml/2014/1/17/105&lt;br /&gt;
&lt;br /&gt;
* network traffic throttling&lt;br /&gt;
  block implemented &amp;quot;continuous leaky bucket&amp;quot; for throttling&lt;br /&gt;
  we can use continuous leaky bucket to network&lt;br /&gt;
  IOPS/BPS * RX/TX/TOTAL&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* Allocate mac_table dynamically&lt;br /&gt;
&lt;br /&gt;
  In the future, maybe we can allocate the mac_table dynamically instead&lt;br /&gt;
  of embed it in VirtIONet. Then we can just does a pointer swap and&lt;br /&gt;
  gfree() and can save a memcpy() here.&lt;br /&gt;
  Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
    Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
        Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* add documentation for macvlan and macvtap&lt;br /&gt;
   recent docs here:&lt;br /&gt;
   http://backreference.org/2014/03/20/some-notes-on-macvlanmacvtap/&lt;br /&gt;
   need to integrate in iproute and kernel docs.&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
  Search for &amp;quot;Xin Xiaohui: Provide a zero-copy method on KVM virtio-net&amp;quot;&lt;br /&gt;
  for a very old prototype&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
* non-virtio device support with vhost&lt;br /&gt;
  Use vhost interface for guests that don&#039;t use virtio-net&lt;br /&gt;
&lt;br /&gt;
* Extend sndbuf scope to int64&lt;br /&gt;
&lt;br /&gt;
  Current sndbuf limit is INT_MAX in tap_set_sndbuf(),&lt;br /&gt;
  large values (like 8388607T) can be converted rightly by qapi from qemu commandline,&lt;br /&gt;
  If we want to support the large values, we should extend sndbuf limit from &#039;int&#039; to &#039;int64&#039;&lt;br /&gt;
&lt;br /&gt;
  Upstream discussion: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg04192.html&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear ===&lt;br /&gt;
&lt;br /&gt;
* change tcp_tso_should_defer for kvm: batch more&lt;br /&gt;
  aggressively.&lt;br /&gt;
  in particular, see below&lt;br /&gt;
&lt;br /&gt;
* tcp: increase gso buffering for cubic,reno&lt;br /&gt;
    At the moment we push out an skb whenever the limit becomes&lt;br /&gt;
    large enough to send a full-sized TSO skb even if the skb,&lt;br /&gt;
    in fact, is not full-sized.&lt;br /&gt;
    The reason for this seems to be that some congestion avoidance&lt;br /&gt;
    protocols rely on the number of packets in flight to calculate&lt;br /&gt;
    CWND, so if we underuse the available CWND it shrinks&lt;br /&gt;
    which degrades performance:&lt;br /&gt;
    http://www.mail-archive.com/netdev@vger.kernel.org/msg08738.html&lt;br /&gt;
&lt;br /&gt;
    However, there seems to be no reason to do this for&lt;br /&gt;
    protocols such as reno and cubic which don&#039;t rely on packets in flight,&lt;br /&gt;
    and so will simply increase CWND a bit more to compensate for the&lt;br /&gt;
    underuse.&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        kernel part is done (Vlad Yasevich)&lt;br /&gt;
        teach qemu to notify libvirt to enable the filter (still to do) (existed NIC_RX_FILTER_CHANGED event contains vlan-tables)&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* virtio: preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
* vxlan&lt;br /&gt;
  What could we do here?&lt;br /&gt;
&lt;br /&gt;
* bridging without promisc mode with OVS&lt;br /&gt;
&lt;br /&gt;
=== high level issues: not clear what the project is, yet ===&lt;br /&gt;
&lt;br /&gt;
* security: iptables&lt;br /&gt;
At the moment most people disables iptables to get&lt;br /&gt;
good performance on 10G/s networking.&lt;br /&gt;
Any way to improve experience?&lt;br /&gt;
&lt;br /&gt;
* performance&lt;br /&gt;
Going through scheduler and full networking stack twice&lt;br /&gt;
(host+guest) adds a lot of overhead&lt;br /&gt;
Any way to allow bypassing some layers?&lt;br /&gt;
&lt;br /&gt;
* manageability&lt;br /&gt;
Still hard to figure out VM networking,&lt;br /&gt;
VM networking is through libvirt, host networking through NM&lt;br /&gt;
Any way to integrate?&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Write some unit tests for vhost-net/vhost-scsi&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
* Migrate some of the performance regression autotest functionality into Netperf&lt;br /&gt;
  - Get the CPU-utilization of the Host and the other-party, and add them to the report. This is also true for other Host measures, such as vmexits, interrupts, ...&lt;br /&gt;
  - Run Netperf in demo-mode, and measure only the time when all the sessions are active (could be many seconds after the beginning of the tests)&lt;br /&gt;
  - Packaging of Netperf in Fedora / RHEL (exists in Fedora). Licensing could be an issue.&lt;br /&gt;
  - Make the scripts more visible&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=118497</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=118497"/>
		<updated>2014-11-10T09:17:42Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome! ===&lt;br /&gt;
&lt;br /&gt;
* virtio 1.0 support in virtio net&lt;br /&gt;
    required for maintainatibility&lt;br /&gt;
    Developer: MST&lt;br /&gt;
&lt;br /&gt;
* BQL for virtio net&lt;br /&gt;
   dependencies: orphan packets less agressively, enable tx interrupt &lt;br /&gt;
   Developers: MST, Jason&lt;br /&gt;
* orphan packets less agressively (was make pktgen works for virtio-net ( or partially orphan ))&lt;br /&gt;
       virtio-net orphans all skbs during tx, this used to be optimal.&lt;br /&gt;
       Recent changes in guest networking stack and hardware advances&lt;br /&gt;
       such as APICv changed optimal behaviour for drivers.&lt;br /&gt;
       We need to revisit optimizations such as orphaning all packets early&lt;br /&gt;
       to have optimal behaviour.&lt;br /&gt;
&lt;br /&gt;
       this should also fix pktgen which is currently broken with virtio net:&lt;br /&gt;
       orphaning all skbs makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: bring back tx interrupt (partially)&lt;br /&gt;
       Jason&#039;s idea: introduce a flag to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developers: Jason Wang, MST&lt;br /&gt;
&lt;br /&gt;
* enable tx interrupt (conditionally?)&lt;br /&gt;
  Small packet TCP stream performance is not good. This is because virtio-net orphan the packet during ndo_start_xmit() which disable the TCP small packet optimizations like TCP small Queue and AutoCork. The idea is enable the tx interrupt to TCP small packets.&lt;br /&gt;
  Jason&#039;s idea: switch between poll and tx interrupt mode based on recent statistics.&lt;br /&gt;
  MST&#039;s idea: use a per descriptor flag for virtio to force interrupt for a specific packet.&lt;br /&gt;
  Developer: Jason Wang, MST&lt;br /&gt;
  &lt;br /&gt;
&lt;br /&gt;
* vhost-net polling&lt;br /&gt;
      mid.gmane.org/20141029123831.A80F338002D@moren.haifa.ibm.com&lt;br /&gt;
      Developer: Razya Ladelsky&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Razya Ladelsky, Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* support more queues in tun&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     http://mid.gmane.org/1408369040-1216-1-git-send-email-pagupta@redhat.com&lt;br /&gt;
     Developers: Pankaj Gupta, Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Documentation/networking/scaling.txt&lt;br /&gt;
       Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default.&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
* eliminate the extra copy in virtio-net driver&lt;br /&gt;
       We need do an extra copy of 128 bytes for every packets. &lt;br /&gt;
       This could be eliminated for small packets by:&lt;br /&gt;
       1) use build_skb() and head frag&lt;br /&gt;
       2) bigger vnet header length ( &amp;gt;= NET_SKB_PAD + NET_IP_ALIGN )&lt;br /&gt;
       Or use a dedicated queue for small packet receiving ? (reordering)&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Write a ethtool seftest for virtio-net&lt;br /&gt;
        Implement selftest ethtool method for virtio-net for regression test e.g the CVEs found for tun/macvtap, qemu and vhost.&lt;br /&gt;
        mid.gmane.org/1409881866-14780-1-git-send-email-hjxiaohust@gmail.com&lt;br /&gt;
        Developers: Hengjinxiao,Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Dev watchdog for virtio-net:&lt;br /&gt;
        Implement a watchdog for virtio-net. This will be useful for hunting host bugs early.&lt;br /&gt;
        Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Amos Kong&lt;br /&gt;
        qemu: https://bugzilla.redhat.com/show_bug.cgi?id=848203 (applied by upstream)&lt;br /&gt;
        libvirt: https://bugzilla.redhat.com/show_bug.cgi?id=848199&lt;br /&gt;
        http://git.qemu.org/?p=qemu.git;a=commit;h=b1be42803b31a913bab65bab563a8760ad2e7f7f&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
* Enable LRO with bridging&lt;br /&gt;
  Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman?&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: Marcel Apfelbaum&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - rx busy polling for virtio-net [DONE]&lt;br /&gt;
    see https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=91815639d8804d1eee7ce2e1f7f60b36771db2c9. 1 byte netperf TCP_RR shows 127% improvement.&lt;br /&gt;
    Future work is co-operate with host, and only does the busy polling when there&#039;s no other process in host cpu. &lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
  Reduce the number of interrupt&lt;br /&gt;
  Rx interrupt coalescing should be good for rx stream throughput.&lt;br /&gt;
  Tx interrupt coalescing will help the optimization of enabling tx interrupt conditionally.&lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* sharing config interrupts&lt;br /&gt;
  Support more devices by sharing a single msi vector&lt;br /&gt;
  between multiple virtio devices.&lt;br /&gt;
  (Applies to virtio-blk too).&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Multi-queue macvtap with real multiple queues&lt;br /&gt;
        Macvtap only provides multiple queues to user in the form of multiple&lt;br /&gt;
        sockets.  As each socket will perform dev_queue_xmit() and we don&#039;t&lt;br /&gt;
        really have multiple real queues on the device, we now have a lock&lt;br /&gt;
        contention.  This contention needs to be addressed.&lt;br /&gt;
        Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* better xmit queueing for tun&lt;br /&gt;
        when guest is slower than host, tun drops packets&lt;br /&gt;
        aggressively. This is because keeping packets on&lt;br /&gt;
        the internal queue does not work well.&lt;br /&gt;
        re-enable functionality to stop queue,&lt;br /&gt;
        probably with some watchdog to help with buggy guests.&lt;br /&gt;
        Developer: MST&lt;br /&gt;
 &lt;br /&gt;
=== projects in need of an owner ===&lt;br /&gt;
&lt;br /&gt;
* large-order allocations&lt;br /&gt;
   see 28d6427109d13b0f447cba5761f88d3548e83605&lt;br /&gt;
   Developer: MST&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Contact: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Contact: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  This project seems abandoned?&lt;br /&gt;
  Contact: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level-triggered interrupts&lt;br /&gt;
  aim: enable vhost by default for level interrupts.&lt;br /&gt;
  The benefit is security: we want to avoid using userspace&lt;br /&gt;
  virtio net so that vhost-net is always used.&lt;br /&gt;
&lt;br /&gt;
  Alex emulated (post &amp;amp; re-enable) level-triggered interrupt in KVM for&lt;br /&gt;
  skipping userspace. VFIO already enjoied the performance benefit,&lt;br /&gt;
  let&#039;s do it for virtio-pci. Current virtio-pci devices still use&lt;br /&gt;
  level-interrupt in userspace.&lt;br /&gt;
  see: kernel:&lt;br /&gt;
  7a84428af [PATCH] KVM: Add resampling irqfds for level triggered interrupts&lt;br /&gt;
 qemu:&lt;br /&gt;
  68919cac [PATCH] hw/vfio: set interrupts using pci irq wrappers&lt;br /&gt;
           (virtio-pci didn&#039;t use the wrappers)&lt;br /&gt;
  e1d1e586 [PATCH] vfio-pci: Add KVM INTx acceleration&lt;br /&gt;
&lt;br /&gt;
  Contact: Amos Kong, MST       &lt;br /&gt;
&lt;br /&gt;
* Head of line blocking issue with zerocopy&lt;br /&gt;
       zerocopy has several defects that will cause head of line blocking problem:&lt;br /&gt;
       - limit the number of pending DMAs&lt;br /&gt;
       - complete in order&lt;br /&gt;
       This means is one of some of the DMAs were delayed, all other will also delayed. This could be reproduced with following case:&lt;br /&gt;
       - boot two VMS VM1(tap1) and VM2(tap2) on host1 (has eth0)&lt;br /&gt;
       - setup tbf to limit the tap2 bandwidth to 10Mbit/s&lt;br /&gt;
       - start two netperf instances one from VM1 to VM2, another from VM1 to an external host whose traffic go through eth0 on host&lt;br /&gt;
       Then you can see not only VM1 to VM2 is throttled, but also VM1 to external host were also throttled.&lt;br /&gt;
       For this issue, a solution is orphan the frags when en queuing to non work conserving qdisc.&lt;br /&gt;
       But we have have similar issues in other case:&lt;br /&gt;
       - The card has its own priority queues&lt;br /&gt;
       - Host has two interface, one is 1G another is 10G, so throttle 1G may lead traffic over 10G to be throttled.&lt;br /&gt;
       The final solution is to remove receive buffering at tun, and convert it to use NAPI&lt;br /&gt;
       Contact: Jason Wang, MST&lt;br /&gt;
       Reference: https://lkml.org/lkml/2014/1/17/105&lt;br /&gt;
&lt;br /&gt;
* network traffic throttling&lt;br /&gt;
  block implemented &amp;quot;continuous leaky bucket&amp;quot; for throttling&lt;br /&gt;
  we can use continuous leaky bucket to network&lt;br /&gt;
  IOPS/BPS * RX/TX/TOTAL&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* Allocate mac_table dynamically&lt;br /&gt;
&lt;br /&gt;
  In the future, maybe we can allocate the mac_table dynamically instead&lt;br /&gt;
  of embed it in VirtIONet. Then we can just does a pointer swap and&lt;br /&gt;
  gfree() and can save a memcpy() here.&lt;br /&gt;
  Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
    Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
        Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        Contact: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* add documentation for macvlan and macvtap&lt;br /&gt;
   recent docs here:&lt;br /&gt;
   http://backreference.org/2014/03/20/some-notes-on-macvlanmacvtap/&lt;br /&gt;
   need to integrate in iproute and kernel docs.&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
  Search for &amp;quot;Xin Xiaohui: Provide a zero-copy method on KVM virtio-net&amp;quot;&lt;br /&gt;
  for a very old prototype&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
* non-virtio device support with vhost&lt;br /&gt;
  Use vhost interface for guests that don&#039;t use virtio-net&lt;br /&gt;
&lt;br /&gt;
* Extend sndbuf scope to int64&lt;br /&gt;
&lt;br /&gt;
  Current sndbuf limit is INT_MAX in tap_set_sndbuf(),&lt;br /&gt;
  large values (like 8388607T) can be converted rightly by qapi from qemu commandline,&lt;br /&gt;
  If we want to support the large values, we should extend sndbuf limit from &#039;int&#039; to &#039;int64&#039;&lt;br /&gt;
&lt;br /&gt;
  Upstream discussion: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg04192.html&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear ===&lt;br /&gt;
&lt;br /&gt;
* change tcp_tso_should_defer for kvm: batch more&lt;br /&gt;
  aggressively.&lt;br /&gt;
  in particular, see below&lt;br /&gt;
&lt;br /&gt;
* tcp: increase gso buffering for cubic,reno&lt;br /&gt;
    At the moment we push out an skb whenever the limit becomes&lt;br /&gt;
    large enough to send a full-sized TSO skb even if the skb,&lt;br /&gt;
    in fact, is not full-sized.&lt;br /&gt;
    The reason for this seems to be that some congestion avoidance&lt;br /&gt;
    protocols rely on the number of packets in flight to calculate&lt;br /&gt;
    CWND, so if we underuse the available CWND it shrinks&lt;br /&gt;
    which degrades performance:&lt;br /&gt;
    http://www.mail-archive.com/netdev@vger.kernel.org/msg08738.html&lt;br /&gt;
&lt;br /&gt;
    However, there seems to be no reason to do this for&lt;br /&gt;
    protocols such as reno and cubic which don&#039;t rely on packets in flight,&lt;br /&gt;
    and so will simply increase CWND a bit more to compensate for the&lt;br /&gt;
    underuse.&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        kernel part is done (Vlad Yasevich)&lt;br /&gt;
        teach qemu to notify libvirt to enable the filter (still to do) (existed NIC_RX_FILTER_CHANGED event contains vlan-tables)&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* virtio: preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
* vxlan&lt;br /&gt;
  What could we do here?&lt;br /&gt;
&lt;br /&gt;
* bridging without promisc mode with OVS&lt;br /&gt;
&lt;br /&gt;
=== high level issues: not clear what the project is, yet ===&lt;br /&gt;
&lt;br /&gt;
* security: iptables&lt;br /&gt;
At the moment most people disables iptables to get&lt;br /&gt;
good performance on 10G/s networking.&lt;br /&gt;
Any way to improve experience?&lt;br /&gt;
&lt;br /&gt;
* performance&lt;br /&gt;
Going through scheduler and full networking stack twice&lt;br /&gt;
(host+guest) adds a lot of overhead&lt;br /&gt;
Any way to allow bypassing some layers?&lt;br /&gt;
&lt;br /&gt;
* manageability&lt;br /&gt;
Still hard to figure out VM networking,&lt;br /&gt;
VM networking is through libvirt, host networking through NM&lt;br /&gt;
Any way to integrate?&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Write some unit tests for vhost-net/vhost-scsi&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
* Migrate some of the performance regression autotest functionality into Netperf&lt;br /&gt;
  - Get the CPU-utilization of the Host and the other-party, and add them to the report. This is also true for other Host measures, such as vmexits, interrupts, ...&lt;br /&gt;
  - Run Netperf in demo-mode, and measure only the time when all the sessions are active (could be many seconds after the beginning of the tests)&lt;br /&gt;
  - Packaging of Netperf in Fedora / RHEL (exists in Fedora). Licensing could be an issue.&lt;br /&gt;
  - Make the scripts more visible&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=20593</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=20593"/>
		<updated>2014-06-09T14:32:08Z</updated>

		<summary type="html">&lt;p&gt;Mst: macvlan doc task&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome! ===&lt;br /&gt;
&lt;br /&gt;
* large-order allocations&lt;br /&gt;
   see 28d6427109d13b0f447cba5761f88d3548e83605&lt;br /&gt;
   Developer: MST&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default.&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
* eliminate the extra copy in virtio-net driver&lt;br /&gt;
       We need do an extra copy of 128 bytes for every packets. &lt;br /&gt;
       This could be eliminated for small packets by:&lt;br /&gt;
       1) use build_skb() and head frag&lt;br /&gt;
       2) bigger vnet header length ( &amp;gt;= NET_SKB_PAD + NET_IP_ALIGN )&lt;br /&gt;
       Or use a dedicated queue for small packet receiving ? (reordering)&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* orphan packets less agressively (was make pktgen works for virtio-net ( or partially orphan ))&lt;br /&gt;
       virtio-net orphans all skbs during tx, this used to be optimal.&lt;br /&gt;
       Recent changes in guest networking stack and hardware advances&lt;br /&gt;
       such as APICv changed optimal behaviour for drivers.&lt;br /&gt;
       We need to revisit optimizations such as orphaning all packets early&lt;br /&gt;
       to have optimal behaviour.&lt;br /&gt;
&lt;br /&gt;
       this should also fix pktgen which is currently broken with virtio net:&lt;br /&gt;
       orphaning all skbs makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: brng back tx interrupt (partially)&lt;br /&gt;
       Jason&#039;s idea: introduce a flat to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developers: Jason Wang, MST&lt;br /&gt;
&lt;br /&gt;
* Announce self by guest driver&lt;br /&gt;
       Send gARP by guest driver. Guest part is finished.&lt;br /&gt;
       Qemu is ongoing.&lt;br /&gt;
       V8 new RFC posted here (limit the changes to virtio-net only)&lt;br /&gt;
       https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02648.html&lt;br /&gt;
       V7 patches is here:&lt;br /&gt;
       http://lists.nongnu.org/archive/html/qemu-devel/2013-03/msg01127.html&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Amos Kong&lt;br /&gt;
        qemu: https://bugzilla.redhat.com/show_bug.cgi?id=848203 (applied by upstream)&lt;br /&gt;
        libvirt: https://bugzilla.redhat.com/show_bug.cgi?id=848199&lt;br /&gt;
        http://git.qemu.org/?p=qemu.git;a=commit;h=b1be42803b31a913bab65bab563a8760ad2e7f7f&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  https://patchwork.kernel.org/patch/1540471/&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
* Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - proposed low latency net polling&lt;br /&gt;
  See http://lkml.indiana.edu/hypermail/linux/kernel/1303.0/00553.html&lt;br /&gt;
  Jason has a draft path to enable low latency polling for virito-net.&lt;br /&gt;
  May also consider it for tun/macvtap.&lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level-triggered interrupts&lt;br /&gt;
  aim: enable vhost by default for level interrupts.&lt;br /&gt;
  The benefit is security: we want to avoid using userspace&lt;br /&gt;
  virtio net so that vhost-net is always used.&lt;br /&gt;
&lt;br /&gt;
  Alex emulated (post &amp;amp; re-enable) level-triggered interrupt in KVM for&lt;br /&gt;
  skipping userspace. VFIO already enjoied the performance benefit,&lt;br /&gt;
  let&#039;s do it for virtio-pci. Current virtio-pci devices still use&lt;br /&gt;
  level-interrupt in userspace.&lt;br /&gt;
&lt;br /&gt;
 kernel:&lt;br /&gt;
  7a84428af [PATCH] KVM: Add resampling irqfds for level triggered interrupts&lt;br /&gt;
 qemu:&lt;br /&gt;
  68919cac [PATCH] hw/vfio: set interrupts using pci irq wrappers&lt;br /&gt;
           (virtio-pci didn&#039;t use the wrappers)&lt;br /&gt;
  e1d1e586 [PATCH] vfio-pci: Add KVM INTx acceleration&lt;br /&gt;
&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* sharing config interrupts&lt;br /&gt;
  Support more devices by sharing a single msi vector&lt;br /&gt;
  between multiple virtio devices.&lt;br /&gt;
  (Applies to virtio-blk too).&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* network traffic throttling&lt;br /&gt;
  block implemented &amp;quot;continuous leaky bucket&amp;quot; for throttling&lt;br /&gt;
  we can use continuous leaky bucket to network&lt;br /&gt;
  IOPS/BPS * RX/TX/TOTAL&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* Allocate mac_table dynamically&lt;br /&gt;
&lt;br /&gt;
  In the future, maybe we can allocate the mac_table dynamically instead&lt;br /&gt;
  of embed it in VirtIONet. Then we can just does a pointer swap and&lt;br /&gt;
  gfree() and can save a memcpy() here.&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
    Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
* add documentation for macvlan and macvtap&lt;br /&gt;
   recent docs here:&lt;br /&gt;
   http://backreference.org/2014/03/20/some-notes-on-macvlanmacvtap/&lt;br /&gt;
   need to integrate in iproute and kernel docs.&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
  Search for &amp;quot;Xin Xiaohui: Provide a zero-copy method on KVM virtio-net&amp;quot;&lt;br /&gt;
  for a very old prototype&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
* non-virtio device support with vhost&lt;br /&gt;
  Use vhost interface for guests that don&#039;t use virtio-net&lt;br /&gt;
&lt;br /&gt;
* Extend sndbuf scope to int64&lt;br /&gt;
&lt;br /&gt;
  Current sndbuf limit is INT_MAX in tap_set_sndbuf(),&lt;br /&gt;
  large values (like 8388607T) can be converted rightly by qapi from qemu commandline,&lt;br /&gt;
  If we want to support the large values, we should extend sndbuf limit from &#039;int&#039; to &#039;int64&#039;&lt;br /&gt;
&lt;br /&gt;
  Upstream discussion: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg04192.html&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear ===&lt;br /&gt;
&lt;br /&gt;
* change tcp_tso_should_defer for kvm: batch more&lt;br /&gt;
  aggressively.&lt;br /&gt;
  in particular, see below&lt;br /&gt;
&lt;br /&gt;
* tcp: increase gso buffering for cubic,reno&lt;br /&gt;
    At the moment we push out an skb whenever the limit becomes&lt;br /&gt;
    large enough to send a full-sized TSO skb even if the skb,&lt;br /&gt;
    in fact, is not full-sized.&lt;br /&gt;
    The reason for this seems to be that some congestion avoidance&lt;br /&gt;
    protocols rely on the number of packets in flight to calculate&lt;br /&gt;
    CWND, so if we underuse the available CWND it shrinks&lt;br /&gt;
    which degrades performance:&lt;br /&gt;
    http://www.mail-archive.com/netdev@vger.kernel.org/msg08738.html&lt;br /&gt;
&lt;br /&gt;
    However, there seems to be no reason to do this for&lt;br /&gt;
    protocols such as reno and cubic which don&#039;t rely on packets in flight,&lt;br /&gt;
    and so will simply increase CWND a bit more to compensate for the&lt;br /&gt;
    underuse.&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        kernel part is done (Vlad Yasevich)&lt;br /&gt;
        teach qemu to notify libvirt to enable the filter (still to do) (existed NIC_RX_FILTER_CHANGED event contains vlan-tables)&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
        Reduce the number of interrupt&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* virtio: preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
* vxlan&lt;br /&gt;
  What could we do here?&lt;br /&gt;
&lt;br /&gt;
* bridging without promisc mode with OVS&lt;br /&gt;
&lt;br /&gt;
=== high level issues: not clear what the project is, yet ===&lt;br /&gt;
&lt;br /&gt;
* security: iptables&lt;br /&gt;
At the moment most people disables iptables to get&lt;br /&gt;
good performance on 10G/s networking.&lt;br /&gt;
Any way to improve experience?&lt;br /&gt;
&lt;br /&gt;
* performance&lt;br /&gt;
Going through scheduler and full networking stack twice&lt;br /&gt;
(host+guest) adds a lot of overhead&lt;br /&gt;
Any way to allow bypassing some layers?&lt;br /&gt;
&lt;br /&gt;
* manageability&lt;br /&gt;
Still hard to figure out VM networking,&lt;br /&gt;
VM networking is through libvirt, host networking through NM&lt;br /&gt;
Any way to integrate?&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Write some unit tests for vhost-net/vhost-scsi&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
* Migrate some of the performance regression autotest functionality into Netperf&lt;br /&gt;
  - Get the CPU-utilization of the Host and the other-party, and add them to the report. This is also true for other Host measures, such as vmexits, interrupts, ...&lt;br /&gt;
  - Run Netperf in demo-mode, and measure only the time when all the sessions are active (could be many seconds after the beginning of the tests)&lt;br /&gt;
  - Packaging of Netperf in Fedora / RHEL (exists in Fedora). Licensing could be an issue.&lt;br /&gt;
  - Make the scripts more visible&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=5392</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=5392"/>
		<updated>2014-03-20T10:15:42Z</updated>

		<summary type="html">&lt;p&gt;Mst: clarify partial orphaning&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome! ===&lt;br /&gt;
&lt;br /&gt;
* large-order allocations&lt;br /&gt;
   see 28d6427109d13b0f447cba5761f88d3548e83605&lt;br /&gt;
   Developer: MST&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default.&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
* eliminate the extra copy in virtio-net driver&lt;br /&gt;
       We need do an extra copy of 128 bytes for every packets. &lt;br /&gt;
       This could be eliminated for small packets by:&lt;br /&gt;
       1) use build_skb() and head frag&lt;br /&gt;
       2) bigger vnet header length ( &amp;gt;= NET_SKB_PAD + NET_IP_ALIGN )&lt;br /&gt;
       Or use a dedicated queue for small packet receiving ? (reordering)&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* orphan packets less agressively (was make pktgen works for virtio-net ( or partially orphan ))&lt;br /&gt;
       virtio-net orphans all skbs during tx, this used to be optimal.&lt;br /&gt;
       Recent changes in guest networking stack and hardware advances&lt;br /&gt;
       such as APICv changed optimal behaviour for drivers.&lt;br /&gt;
       We need to revisit optimizations such as orphaning all packets early&lt;br /&gt;
       to have optimal behaviour.&lt;br /&gt;
&lt;br /&gt;
       this should also fix pktgen which is currently broken with virtio net:&lt;br /&gt;
       orphaning all skbs makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: brng back tx interrupt (partially)&lt;br /&gt;
       Jason&#039;s idea: introduce a flat to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developers: Jason Wang, MST&lt;br /&gt;
&lt;br /&gt;
* Announce self by guest driver&lt;br /&gt;
       Send gARP by guest driver. Guest part is finished.&lt;br /&gt;
       Qemu is ongoing.&lt;br /&gt;
       V7 patches is here:&lt;br /&gt;
       http://lists.nongnu.org/archive/html/qemu-devel/2013-03/msg01127.html&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Amos Kong&lt;br /&gt;
        qemu: https://bugzilla.redhat.com/show_bug.cgi?id=848203 (applied by upstream)&lt;br /&gt;
        libvirt: https://bugzilla.redhat.com/show_bug.cgi?id=848199&lt;br /&gt;
        http://git.qemu.org/?p=qemu.git;a=commit;h=b1be42803b31a913bab65bab563a8760ad2e7f7f&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  https://patchwork.kernel.org/patch/1540471/&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
* Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - proposed low latency net polling&lt;br /&gt;
  See http://lkml.indiana.edu/hypermail/linux/kernel/1303.0/00553.html&lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* sharing config interrupts&lt;br /&gt;
  Support mode devices by sharing a single msi vector&lt;br /&gt;
  between multiple virtio devices.&lt;br /&gt;
  (Applies to virtio-blk too).&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* network traffic throttling&lt;br /&gt;
  block implemented &amp;quot;continuous leaky bucket&amp;quot; for throttling&lt;br /&gt;
  we can use continuous leaky bucket to network&lt;br /&gt;
  IOPS/BPS * RX/TX/TOTAL&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* Allocate mac_table dynamically&lt;br /&gt;
&lt;br /&gt;
  In the future, maybe we can allocate the mac_table dynamically instead&lt;br /&gt;
  of embed it in VirtIONet. Then we can just does a pointer swap and&lt;br /&gt;
  gfree() and can save a memcpy() here.&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
  Search for &amp;quot;Xin Xiaohui: Provide a zero-copy method on KVM virtio-net&amp;quot;&lt;br /&gt;
  for a very old prototype&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
* non-virtio device support with vhost&lt;br /&gt;
  Use vhost interface for guests that don&#039;t use virtio-net&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear ===&lt;br /&gt;
&lt;br /&gt;
* change tcp_tso_should_defer for kvm: batch more&lt;br /&gt;
  aggressively.&lt;br /&gt;
  in particular, see below&lt;br /&gt;
&lt;br /&gt;
* tcp: increase gso buffering for cubic,reno&lt;br /&gt;
    At the moment we push out an skb whenever the limit becomes&lt;br /&gt;
    large enough to send a full-sized TSO skb even if the skb,&lt;br /&gt;
    in fact, is not full-sized.&lt;br /&gt;
    The reason for this seems to be that some congestion avoidance&lt;br /&gt;
    protocols rely on the number of packets in flight to calculate&lt;br /&gt;
    CWND, so if we underuse the available CWND it shrinks&lt;br /&gt;
    which degrades performance:&lt;br /&gt;
    http://www.mail-archive.com/netdev@vger.kernel.org/msg08738.html&lt;br /&gt;
&lt;br /&gt;
    However, there seems to be no reason to do this for&lt;br /&gt;
    protocols such as reno and cubic which don&#039;t rely on packets in flight,&lt;br /&gt;
    and so will simply increase CWND a bit more to compensate for the&lt;br /&gt;
    underuse.&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        kernel part is done (Vlad Yasevich)&lt;br /&gt;
        teach qemu to notify libvirt to enable the filter (still to do) (existed NIC_RX_FILTER_CHANGED event contains vlan-tables)&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
        Reduce the number of interrupt&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* virtio: preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
* vxlan&lt;br /&gt;
  What could we do here?&lt;br /&gt;
&lt;br /&gt;
* bridging without promisc mode with OVS&lt;br /&gt;
&lt;br /&gt;
=== high level issues: not clear what the project is, yet ===&lt;br /&gt;
&lt;br /&gt;
* security: iptables&lt;br /&gt;
At the moment most people disables iptables to get&lt;br /&gt;
good performance on 10G/s networking.&lt;br /&gt;
Any way to improve experience?&lt;br /&gt;
&lt;br /&gt;
* performance&lt;br /&gt;
Going through scheduler and full networking stack twice&lt;br /&gt;
(host+guest) adds a lot of overhead&lt;br /&gt;
Any way to allow bypassing some layers?&lt;br /&gt;
&lt;br /&gt;
* manageability&lt;br /&gt;
Still hard to figure out VM networking,&lt;br /&gt;
VM networking is through libvirt, host networking through NM&lt;br /&gt;
Any way to integrate?&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Write some unit tests for vhost-net/vhost-scsi&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
* Migrate some of the performance regression autotest functionality into Netperf&lt;br /&gt;
  - Get the CPU-utilization of the Host and the other-party, and add them to the report. This is also true for other Host measures, such as vmexits, interrupts, ...&lt;br /&gt;
  - Run Netperf in demo-mode, and measure only the time when all the sessions are active (could be many seconds after the beginning of the tests)&lt;br /&gt;
  - Packaging of Netperf in Fedora / RHEL (exists in Fedora). Licensing could be an issue.&lt;br /&gt;
  - Make the scripts more visible&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=5032</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=5032"/>
		<updated>2014-02-06T14:47:49Z</updated>

		<summary type="html">&lt;p&gt;Mst: and more unowned projects&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome! ===&lt;br /&gt;
&lt;br /&gt;
* large-order allocations&lt;br /&gt;
   see 28d6427109d13b0f447cba5761f88d3548e83605&lt;br /&gt;
   Developer: MST&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* multiqueue support in macvtap&lt;br /&gt;
       multiqueue is only supported for tun.&lt;br /&gt;
       Add support for macvtap.&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default.&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
* eliminate the extra copy in virtio-net driver&lt;br /&gt;
       We need do an extra copy of 128 bytes for every packets. &lt;br /&gt;
       This could be eliminated for small packets by:&lt;br /&gt;
       1) use build_skb() and head frag&lt;br /&gt;
       2) bigger vnet header length ( &amp;gt;= NET_SKB_PAD + NET_IP_ALIGN )&lt;br /&gt;
       Or use a dedicated queue for small packet receiving ? (reordering)&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* make pktgen works for virtio-net ( or partially orphan )&lt;br /&gt;
       virtio-net orphan the skb during tx,&lt;br /&gt;
       which will makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: introduce a flat to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Add HW_VLAN_TX support for tap&lt;br /&gt;
       Eliminate the extra data moving for tagged packets&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Announce self by guest driver&lt;br /&gt;
       Send gARP by guest driver. Guest part is finished.&lt;br /&gt;
       Qemu is ongoing.&lt;br /&gt;
       V7 patches is here:&lt;br /&gt;
       http://lists.nongnu.org/archive/html/qemu-devel/2013-03/msg01127.html&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Amos Kong&lt;br /&gt;
        qemu: https://bugzilla.redhat.com/show_bug.cgi?id=848203 (applied by upstream)&lt;br /&gt;
        libvirt: https://bugzilla.redhat.com/show_bug.cgi?id=848199&lt;br /&gt;
        http://git.qemu.org/?p=qemu.git;a=commit;h=b1be42803b31a913bab65bab563a8760ad2e7f7f&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  https://patchwork.kernel.org/patch/1540471/&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
* Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - proposed low latency net polling&lt;br /&gt;
  See http://lkml.indiana.edu/hypermail/linux/kernel/1303.0/00553.html&lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* sharing config interrupts&lt;br /&gt;
  Support mode devices by sharing a single msi vector&lt;br /&gt;
  between multiple virtio devices.&lt;br /&gt;
  (Applies to virtio-blk too).&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* network traffic throttling&lt;br /&gt;
  block implemented &amp;quot;continuous leaky bucket&amp;quot; for throttling&lt;br /&gt;
  we can use continuous leaky bucket to network&lt;br /&gt;
  IOPS/BPS * RX/TX/TOTAL&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* Allocate mac_table dynamically&lt;br /&gt;
&lt;br /&gt;
  In the future, maybe we can allocate the mac_table dynamically instead&lt;br /&gt;
  of embed it in VirtIONet. Then we can just does a pointer swap and&lt;br /&gt;
  gfree() and can save a memcpy() here.&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
  Search for &amp;quot;Xin Xiaohui: Provide a zero-copy method on KVM virtio-net&amp;quot;&lt;br /&gt;
  for a very old prototype&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
* non-virtio device support with vhost&lt;br /&gt;
  Use vhost interface for guests that don&#039;t use virtio-net&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear ===&lt;br /&gt;
&lt;br /&gt;
* change tcp_tso_should_defer for kvm: batch more&lt;br /&gt;
  aggressively.&lt;br /&gt;
  in particular, see below&lt;br /&gt;
&lt;br /&gt;
* tcp: increase gso buffering for cubic,reno&lt;br /&gt;
    At the moment we push out an skb whenever the limit becomes&lt;br /&gt;
    large enough to send a full-sized TSO skb even if the skb,&lt;br /&gt;
    in fact, is not full-sized.&lt;br /&gt;
    The reason for this seems to be that some congestion avoidance&lt;br /&gt;
    protocols rely on the number of packets in flight to calculate&lt;br /&gt;
    CWND, so if we underuse the available CWND it shrinks&lt;br /&gt;
    which degrades performance:&lt;br /&gt;
    http://www.mail-archive.com/netdev@vger.kernel.org/msg08738.html&lt;br /&gt;
&lt;br /&gt;
    However, there seems to be no reason to do this for&lt;br /&gt;
    protocols such as reno and cubic which don&#039;t rely on packets in flight,&lt;br /&gt;
    and so will simply increase CWND a bit more to compensate for the&lt;br /&gt;
    underuse.&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        kernel part is done (Vlad Yasevich)&lt;br /&gt;
        teach qemu to notify libvirt to enable the filter (still to do) (existed NIC_RX_FILTER_CHANGED event contains vlan-tables)&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
        Reduce the number of interrupt&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* virtio: preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
* vxlan&lt;br /&gt;
  What could we do here?&lt;br /&gt;
&lt;br /&gt;
* bridging without promisc mode with OVS&lt;br /&gt;
&lt;br /&gt;
=== high level issues: not clear what the project is, yet ===&lt;br /&gt;
&lt;br /&gt;
* security: iptables&lt;br /&gt;
At the moment most people disables iptables to get&lt;br /&gt;
good performance on 10G/s networking.&lt;br /&gt;
Any way to improve experience?&lt;br /&gt;
&lt;br /&gt;
* performance&lt;br /&gt;
Going through scheduler and full networking stack twice&lt;br /&gt;
(host+guest) adds a lot of overhead&lt;br /&gt;
Any way to allow bypassing some layers?&lt;br /&gt;
&lt;br /&gt;
* manageability&lt;br /&gt;
Still hard to figure out VM networking,&lt;br /&gt;
VM networking is through libvirt, host networking through NM&lt;br /&gt;
Any way to integrate?&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Write some unit tests for vhost-net/vhost-scsi&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
* Migrate some of the performance regression autotest functionality into Netperf&lt;br /&gt;
  - Get the CPU-utilization of the Host and the other-party, and add them to the report. This is also true for other Host measures, such as vmexits, interrupts, ...&lt;br /&gt;
  - Run Netperf in demo-mode, and measure only the time when all the sessions are active (could be many seconds after the beginning of the tests)&lt;br /&gt;
  - Packaging of Netperf in Fedora / RHEL (exists in Fedora). Licensing could be an issue.&lt;br /&gt;
  - Make the scripts more visible&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=5028</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=5028"/>
		<updated>2014-02-02T21:47:10Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome! ===&lt;br /&gt;
&lt;br /&gt;
* large-order allocations&lt;br /&gt;
   see 28d6427109d13b0f447cba5761f88d3548e83605&lt;br /&gt;
   Developer: MST&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* multiqueue support in macvtap&lt;br /&gt;
       multiqueue is only supported for tun.&lt;br /&gt;
       Add support for macvtap.&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default.&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
* eliminate the extra copy in virtio-net driver&lt;br /&gt;
       We need do an extra copy of 128 bytes for every packets. &lt;br /&gt;
       This could be eliminated for small packets by:&lt;br /&gt;
       1) use build_skb() and head frag&lt;br /&gt;
       2) bigger vnet header length ( &amp;gt;= NET_SKB_PAD + NET_IP_ALIGN )&lt;br /&gt;
       Or use a dedicated queue for small packet receiving ? (reordering)&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* make pktgen works for virtio-net ( or partially orphan )&lt;br /&gt;
       virtio-net orphan the skb during tx,&lt;br /&gt;
       which will makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: introduce a flat to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Add HW_VLAN_TX support for tap&lt;br /&gt;
       Eliminate the extra data moving for tagged packets&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Announce self by guest driver&lt;br /&gt;
       Send gARP by guest driver. Guest part is finished.&lt;br /&gt;
       Qemu is ongoing.&lt;br /&gt;
       V7 patches is here:&lt;br /&gt;
       http://lists.nongnu.org/archive/html/qemu-devel/2013-03/msg01127.html&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Amos Kong&lt;br /&gt;
        qemu: https://bugzilla.redhat.com/show_bug.cgi?id=848203 (applied by upstream)&lt;br /&gt;
        libvirt: https://bugzilla.redhat.com/show_bug.cgi?id=848199&lt;br /&gt;
        http://git.qemu.org/?p=qemu.git;a=commit;h=b1be42803b31a913bab65bab563a8760ad2e7f7f&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  https://patchwork.kernel.org/patch/1540471/&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
* Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - proposed low latency net polling&lt;br /&gt;
  See http://lkml.indiana.edu/hypermail/linux/kernel/1303.0/00553.html&lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* sharing config interrupts&lt;br /&gt;
  Support mode devices by sharing a single msi vector&lt;br /&gt;
  between multiple virtio devices.&lt;br /&gt;
  (Applies to virtio-blk too).&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* network traffic throttling&lt;br /&gt;
  block implemented &amp;quot;continuous leaky bucket&amp;quot; for throttling&lt;br /&gt;
  we can use continuous leaky bucket to network&lt;br /&gt;
  IOPS/BPS * RX/TX/TOTAL&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* Allocate mac_table dynamically&lt;br /&gt;
&lt;br /&gt;
  In the future, maybe we can allocate the mac_table dynamically instead&lt;br /&gt;
  of embed it in VirtIONet. Then we can just does a pointer swap and&lt;br /&gt;
  gfree() and can save a memcpy() here.&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* Bug: e1000 &amp;amp; rtl8139: Change macaddr in guest, but not update to qemu (info network)&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
  https://bugzilla.redhat.com/show_bug.cgi?id=922589&lt;br /&gt;
  Status: patches applied&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
  Search for &amp;quot;Xin Xiaohui: Provide a zero-copy method on KVM virtio-net&amp;quot;&lt;br /&gt;
  for a very old prototype&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
* non-virtio device support with vhost&lt;br /&gt;
  Use vhost interface for guests that don&#039;t use virtio-net&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear ===&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        kernel part is done (Vlad Yasevich)&lt;br /&gt;
        teach qemu to notify libvirt to enable the filter (still to do) (existed NIC_RX_FILTER_CHANGED event contains vlan-tables)&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
        Reduce the number of interrupt&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* virtio: preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
* vxlan&lt;br /&gt;
  What could we do here?&lt;br /&gt;
&lt;br /&gt;
* bridging without promisc mode with OVS&lt;br /&gt;
&lt;br /&gt;
=== high level issues: not clear what the project is, yet ===&lt;br /&gt;
&lt;br /&gt;
* security: iptables&lt;br /&gt;
At the moment most people disables iptables to get&lt;br /&gt;
good performance on 10G/s networking.&lt;br /&gt;
Any way to improve experience?&lt;br /&gt;
&lt;br /&gt;
* performance&lt;br /&gt;
Going through scheduler and full networking stack twice&lt;br /&gt;
(host+guest) adds a lot of overhead&lt;br /&gt;
Any way to allow bypassing some layers?&lt;br /&gt;
&lt;br /&gt;
* manageability&lt;br /&gt;
Still hard to figure out VM networking,&lt;br /&gt;
VM networking is through libvirt, host networking through NM&lt;br /&gt;
Any way to integrate?&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Write some unit tests for vhost-net/vhost-scsi&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
* Migrate some of the performance regression autotest functionality into Netperf&lt;br /&gt;
  - Get the CPU-utilization of the Host and the other-party, and add them to the report. This is also true for other Host measures, such as vmexits, interrupts, ...&lt;br /&gt;
  - Run Netperf in demo-mode, and measure only the time when all the sessions are active (could be many seconds after the beginning of the tests)&lt;br /&gt;
  - Packaging of Netperf in Fedora / RHEL (exists in Fedora). Licensing could be an issue.&lt;br /&gt;
  - Make the scripts more visible&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4980</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4980"/>
		<updated>2013-11-11T10:37:51Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome! ===&lt;br /&gt;
&lt;br /&gt;
* large-order allocations&lt;br /&gt;
   see 28d6427109d13b0f447cba5761f88d3548e83605&lt;br /&gt;
   Developer: MST&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* multiqueue support in macvtap&lt;br /&gt;
       multiqueue is only supported for tun.&lt;br /&gt;
       Add support for macvtap.&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default.&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
* eliminate the extra copy in virtio-net driver&lt;br /&gt;
       We need do an extra copy of 128 bytes for every packets. &lt;br /&gt;
       This could be eliminated for small packets by:&lt;br /&gt;
       1) use build_skb() and head frag&lt;br /&gt;
       2) bigger vnet header length ( &amp;gt;= NET_SKB_PAD + NET_IP_ALIGN )&lt;br /&gt;
       Or use a dedicated queue for small packet receiving ? (reordering)&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* make pktgen works for virtio-net ( or partially orphan )&lt;br /&gt;
       virtio-net orphan the skb during tx,&lt;br /&gt;
       which will makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: introduce a flat to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Add HW_VLAN_TX support for tap&lt;br /&gt;
       Eliminate the extra data moving for tagged packets&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Announce self by guest driver&lt;br /&gt;
       Send gARP by guest driver. Guest part is finished.&lt;br /&gt;
       Qemu is ongoing.&lt;br /&gt;
       V7 patches is here:&lt;br /&gt;
       http://lists.nongnu.org/archive/html/qemu-devel/2013-03/msg01127.html&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Amos Kong&lt;br /&gt;
        qemu: https://bugzilla.redhat.com/show_bug.cgi?id=848203 (applied by upstream)&lt;br /&gt;
        libvirt: https://bugzilla.redhat.com/show_bug.cgi?id=848199&lt;br /&gt;
        http://git.qemu.org/?p=qemu.git;a=commit;h=b1be42803b31a913bab65bab563a8760ad2e7f7f&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  https://patchwork.kernel.org/patch/1540471/&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
* Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - proposed low latency net polling&lt;br /&gt;
  See http://lkml.indiana.edu/hypermail/linux/kernel/1303.0/00553.html&lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* sharing config interrupts&lt;br /&gt;
  Support mode devices by sharing a single msi vector&lt;br /&gt;
  between multiple virtio devices.&lt;br /&gt;
  (Applies to virtio-blk too).&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* Bug: e1000 &amp;amp; rtl8139: Change macaddr in guest, but not update to qemu (info network)&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
  https://bugzilla.redhat.com/show_bug.cgi?id=922589&lt;br /&gt;
  Status: patches applied&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
  Search for &amp;quot;Xin Xiaohui: Provide a zero-copy method on KVM virtio-net&amp;quot;&lt;br /&gt;
  for a very old prototype&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
* non-virtio device support with vhost&lt;br /&gt;
  Use vhost interface for guests that don&#039;t use virtio-net&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear ===&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        kernel part is done (Vlad Yasevich)&lt;br /&gt;
        teach qemu to notify libvirt to enable the filter (still to do) (existed NIC_RX_FILTER_CHANGED event contains vlan-tables)&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
        Reduce the number of interrupt&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* virtio: preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
* vxlan&lt;br /&gt;
  What could we do here?&lt;br /&gt;
&lt;br /&gt;
* bridging without promisc mode with OVS&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Write some unit tests for vhost-net/vhost-scsi&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
* Migrate some of the performance regression autotest functionality into Netperf&lt;br /&gt;
  - Get the CPU-utilization of the Host and the other-party, and add them to the report. This is also true for other Host measures, such as vmexits, interrupts, ...&lt;br /&gt;
  - Run Netperf in demo-mode, and measure only the time when all the sessions are active (could be many seconds after the beginning of the tests)&lt;br /&gt;
  - Packaging of Netperf in Fedora / RHEL (exists in Fedora). Licensing could be an issue.&lt;br /&gt;
  - Make the scripts more visible&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4873</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4873"/>
		<updated>2013-09-17T14:54:53Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome!&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* multiqueue support in macvtap&lt;br /&gt;
       multiqueue is only supported for tun.&lt;br /&gt;
       Add support for macvtap.&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default.&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
* eliminate the extra copy in virtio-net driver&lt;br /&gt;
       We need do an extra copy of 128 bytes for every packets. &lt;br /&gt;
       This could be eliminated for small packets by:&lt;br /&gt;
       1) use build_skb() and head frag&lt;br /&gt;
       2) bigger vnet header length ( &amp;gt;= NET_SKB_PAD + NET_IP_ALIGN )&lt;br /&gt;
       Or use a dedicated queue for small packet receiving ? (reordering)&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* make pktgen works for virtio-net ( or partially orphan )&lt;br /&gt;
       virtio-net orphan the skb during tx,&lt;br /&gt;
       which will makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: introduce a flat to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Add HW_VLAN_TX support for tap&lt;br /&gt;
       Eliminate the extra data moving for tagged packets&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Announce self by guest driver&lt;br /&gt;
       Send gARP by guest driver. Guest part is finished.&lt;br /&gt;
       Qemu is ongoing.&lt;br /&gt;
       V7 patches is here:&lt;br /&gt;
       http://lists.nongnu.org/archive/html/qemu-devel/2013-03/msg01127.html&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Amos Kong&lt;br /&gt;
        qemu: https://bugzilla.redhat.com/show_bug.cgi?id=848203&lt;br /&gt;
        libvirt: https://bugzilla.redhat.com/show_bug.cgi?id=848199&lt;br /&gt;
        https://git.kernel.org/cgit/virt/kvm/mst/qemu.git/patch/?id=1c0fa6b709d02fe4f98d4ce7b55a6cc3c925791c&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  https://patchwork.kernel.org/patch/1540471/&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
* Bug: e1000 &amp;amp; rtl8139: Change macaddr in guest, but not update to qemu (info network)&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
  https://bugzilla.redhat.com/show_bug.cgi?id=922589&lt;br /&gt;
&lt;br /&gt;
* Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - proposed low latency net polling&lt;br /&gt;
  See http://lkml.indiana.edu/hypermail/linux/kernel/1303.0/00553.html&lt;br /&gt;
  Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* sharing config interrupts&lt;br /&gt;
  Support mode devices by sharing a single msi vector&lt;br /&gt;
  between multiple virtio devices.&lt;br /&gt;
  (Applies to virtio-blk too).&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
  Search for &amp;quot;Xin Xiaohui: Provide a zero-copy method on KVM virtio-net&amp;quot;&lt;br /&gt;
  for a very old prototype&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
* non-virtio device support with vhost&lt;br /&gt;
  Use vhost interface for guests that don&#039;t use virtio-net&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        kernel part is done (Vlad Yasevich)&lt;br /&gt;
        teach qemu to notify libvirt to enable the filter (still to do) (existed NIC_RX_FILTER_CHANGED event contains vlan-tables)&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
        Reduce the number of interrupt&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* virtio: preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
* vxlan&lt;br /&gt;
  What could we do here?&lt;br /&gt;
&lt;br /&gt;
* bridging without promisc mode with OVS&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Write some unit tests for vhost-net/vhost-scsi&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
* Migrate some of the performance regression autotest functionality into Netperf&lt;br /&gt;
  - Get the CPU-utilization of the Host and the other-party, and add them to the report. This is also true for other Host measures, such as vmexits, interrupts, ...&lt;br /&gt;
  - Run Netperf in demo-mode, and measure only the time when all the sessions are active (could be many seconds after the beginning of the tests)&lt;br /&gt;
  - Packaging of Netperf in Fedora / RHEL (exists in Fedora). Licensing could be an issue.&lt;br /&gt;
  - Make the scripts more visible&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=PCITodo&amp;diff=4865</id>
		<title>PCITodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=PCITodo&amp;diff=4865"/>
		<updated>2013-08-22T09:19:02Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all PCI related activity in KVM.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome! ===&lt;br /&gt;
&lt;br /&gt;
* virtio device as PCI Express device&lt;br /&gt;
  Issue: Express spec requires device can work without IO,&lt;br /&gt;
  virtio requires IO at the moment.&lt;br /&gt;
  Plan: add support for memory BARs.&lt;br /&gt;
  Developer: Michael S. Tsirkin&lt;br /&gt;
&lt;br /&gt;
* Hotplug for devices behind PCI bridges&lt;br /&gt;
  Issue: QEMU lacks support for device hotplug behind&lt;br /&gt;
  pci bridges.&lt;br /&gt;
&lt;br /&gt;
   Plan:&lt;br /&gt;
    - each bus gets assigned a number 0-255&lt;br /&gt;
    - generated ACPI code writes this number&lt;br /&gt;
      to a new BSEL register, then uses existing&lt;br /&gt;
      UP/DOWN registers to probe slot status;&lt;br /&gt;
      to eject, write number to BSEL register,&lt;br /&gt;
      then slot into existing EJ&lt;br /&gt;
    This is to address the ACPI spec requirement to&lt;br /&gt;
    avoid config cycle access to any bus except PCI roots.&lt;br /&gt;
&lt;br /&gt;
    Note: ACPI doesn&#039;t support adding or removing bridges by hotplug.&lt;br /&gt;
    We should and prevent removal of bridges by hotplug,&lt;br /&gt;
    unless they were added by hotplug previously&lt;br /&gt;
    (and so, are not described by ACPI).&lt;br /&gt;
  Developer: Michael S. Tsirkin&lt;br /&gt;
&lt;br /&gt;
* Hotplug for Q35&lt;br /&gt;
   Issue: QEMU does not support hotplug for Q35&lt;br /&gt;
   Plan: since we need to support hotplug of PCI devices,&lt;br /&gt;
   let&#039;s use ACPI hotplug for everything&lt;br /&gt;
   Use same interface as we do for PCI, this way&lt;br /&gt;
   same ACPI code can be reused.&lt;br /&gt;
&lt;br /&gt;
   Developer: Michael S. Tsirkin&lt;br /&gt;
&lt;br /&gt;
* Support for different PCI express link width/speed settings&lt;br /&gt;
      Issue: QEMU currently emulates all links at minimal&lt;br /&gt;
      width and speed. This means we don&#039;t need to emulate&lt;br /&gt;
      link negotiation, but might in theory confuse guests&lt;br /&gt;
      for assigned devices.&lt;br /&gt;
      The issue is complicated by the fact that real link speed&lt;br /&gt;
      might be limited by the slot where assigned device is put.&lt;br /&gt;
      Plan: add management interface to control the max link&lt;br /&gt;
      speed and width for the slot.&lt;br /&gt;
      Teach management to query this at slot level.&lt;br /&gt;
      For device, query it from device itself.&lt;br /&gt;
      Support link width/speed negotiation  as per spec.&lt;br /&gt;
   Developer: Alex Williamson&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
* PCI interrupts should be active-low&lt;br /&gt;
      Issue: PCI INT#x interrupts are normally active-low.&lt;br /&gt;
      QEMU emulates them as active high. Works fine for&lt;br /&gt;
      windows and linux but not guaranteed for other guests.       &lt;br /&gt;
      See http://www.contrib.andrew.cmu.edu/~somlo/OSXKVM/&lt;br /&gt;
&lt;br /&gt;
      Plan: add support for active-low interrupts in KVM.&lt;br /&gt;
            Enable this for PCI interrupts.&lt;br /&gt;
            Change DSDT appropriately.&lt;br /&gt;
&lt;br /&gt;
      Developer: &lt;br /&gt;
      Testing: stress-test devices with INT#x interrupts&lt;br /&gt;
      with interrupt sharing in particular&lt;br /&gt;
&lt;br /&gt;
* PCI master-abort is not emulated correctly&lt;br /&gt;
      Issue: access to disabled PCI memory normally returns&lt;br /&gt;
      all-ones (or read) and sets master abort&lt;br /&gt;
      detected bit in bridge.&lt;br /&gt;
      For express, it can also trigger AER reporting&lt;br /&gt;
      when enabled.&lt;br /&gt;
      QEMU does not emulate any of this: reads return 0,&lt;br /&gt;
      writes.&lt;br /&gt;
      Plan: add catch-all memory region with low priority&lt;br /&gt;
      in bridge, and trigger the required actions.&lt;br /&gt;
&lt;br /&gt;
* Better modeling for PCI INT#x&lt;br /&gt;
      Issue: for a device deep down a bridge hierarchy,&lt;br /&gt;
      we scan the tree upwards on each interrupt,&lt;br /&gt;
      calling map_irq at each level, this is bad for performance.&lt;br /&gt;
      Behaviour is also open-coded at each level, this is ugly.&lt;br /&gt;
      Plan: something similar to MemoryRegion API:&lt;br /&gt;
      add objects that represent PCI INT#x pings&lt;br /&gt;
      (maybe pins in general) model their connection at&lt;br /&gt;
      each level. Each time there&#039;s a change, re-map&lt;br /&gt;
      them. On data path, use pre-computed irq# to&lt;br /&gt;
      send/clear the interrupt quickly.&lt;br /&gt;
&lt;br /&gt;
* Subtractive decoding support&lt;br /&gt;
      Support subtractive decoding in PCI bridges.&lt;br /&gt;
&lt;br /&gt;
* Support VGA behind a PCI bridge&lt;br /&gt;
      Support VGA devices behind PCI bridges.&lt;br /&gt;
      Good for things like multiple VGA cards.&lt;br /&gt;
      Requires subtractive decoding.&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear ===&lt;br /&gt;
* Way to figure out proper PCI connectivity options.&lt;br /&gt;
      Issue: How do you know where you can connect a device?&lt;br /&gt;
      For PCI, this includes the legal bus addresses,&lt;br /&gt;
      hotplug support for bus,&lt;br /&gt;
      how the secondary bus is named,&lt;br /&gt;
      and whether bridges support required addressing modes.&lt;br /&gt;
      For PCI Express, there are additional options:&lt;br /&gt;
      root or downstream port,&lt;br /&gt;
      virtual bridge in root complex/upstream port.&lt;br /&gt;
      management tools end up hard-coding this information,&lt;br /&gt;
      based simply on device name, but that&#039;s ugly.&lt;br /&gt;
      Vague idea: add interfaces to figure out what can be&lt;br /&gt;
      connected to what and how, or at least the function of each device.&lt;br /&gt;
      People to contact: Laine Stump&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Fix AHCI for stability&lt;br /&gt;
      Not related to PCI directly but modern chipsets&lt;br /&gt;
      with PCI Express support all use AHCI.&lt;br /&gt;
      Issue1: AHCI is unstable with windows guests&lt;br /&gt;
       (win7 fails to boot sometimes)&lt;br /&gt;
      Issue2:  guests sometimes crash when doing ping pong migration&lt;br /&gt;
      People to contact: Alexander Graf&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=PCITodo&amp;diff=4864</id>
		<title>PCITodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=PCITodo&amp;diff=4864"/>
		<updated>2013-08-21T13:42:26Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all PCI related activity in KVM.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome! ===&lt;br /&gt;
&lt;br /&gt;
* virtio device as PCI Express device&lt;br /&gt;
  Issue: Express spec requires device can work without IO,&lt;br /&gt;
  virtio requires IO at the moment.&lt;br /&gt;
  Plan: add support for memory BARs.&lt;br /&gt;
  Developer: Michael S. Tsirkin&lt;br /&gt;
&lt;br /&gt;
* Hotplug for devices behind PCI bridges&lt;br /&gt;
  Issue: QEMU lacks support for device hotplug behind&lt;br /&gt;
  pci bridges.&lt;br /&gt;
&lt;br /&gt;
   Plan:&lt;br /&gt;
    - each bus gets assigned a number 0-255&lt;br /&gt;
    - generated ACPI code writes this number&lt;br /&gt;
      to a new BSEL register, then uses existing&lt;br /&gt;
      UP/DOWN registers to probe slot status;&lt;br /&gt;
      to eject, write number to BSEL register,&lt;br /&gt;
      then slot into existing EJ&lt;br /&gt;
    This is to address the ACPI spec requirement to&lt;br /&gt;
    avoid config cycle access to any bus except PCI roots.&lt;br /&gt;
&lt;br /&gt;
    Note: ACPI doesn&#039;t support adding or removing bridges by hotplug.&lt;br /&gt;
    We should and prevent removal of bridges by hotplug,&lt;br /&gt;
    unless they were added by hotplug previously&lt;br /&gt;
    (and so, are not described by ACPI).&lt;br /&gt;
  Developer: Michael S. Tsirkin&lt;br /&gt;
&lt;br /&gt;
* Hotplug for Q35&lt;br /&gt;
   Issue: QEMU does not support hotplug for Q35&lt;br /&gt;
   Plan: since we need to support hotplug of PCI devices,&lt;br /&gt;
   let&#039;s use ACPI hotplug for everything&lt;br /&gt;
   Use same interface as we do for PCI, this way&lt;br /&gt;
   same ACPI code can be reused.&lt;br /&gt;
&lt;br /&gt;
   Developer: Michael S. Tsirkin&lt;br /&gt;
&lt;br /&gt;
* Support for different PCI express link width/speed settings&lt;br /&gt;
      Issue: QEMU currently emulates all links at minimal&lt;br /&gt;
      width and speed. This means we don&#039;t need to emulate&lt;br /&gt;
      link negotiation, but might in theory confuse guests&lt;br /&gt;
      for assigned devices.&lt;br /&gt;
      The issue is complicated by the fact that real link speed&lt;br /&gt;
      might be limited by the slot where assigned device is put.&lt;br /&gt;
      Plan: add management interface to control the max link&lt;br /&gt;
      speed and width for the slot.&lt;br /&gt;
      Teach management to query this at slot level.&lt;br /&gt;
      For device, query it from device itself.&lt;br /&gt;
      Support link width/speed negotiation  as per spec.&lt;br /&gt;
   Developer: Alex Williamson&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
* PCI interrupts should be active-low&lt;br /&gt;
      Issue: PCI INT#x interrupts are normally active-low.&lt;br /&gt;
      QEMU emulates them as active high. Works fine for&lt;br /&gt;
      windows and linux but not guaranteed for other guests.       &lt;br /&gt;
      See http://www.contrib.andrew.cmu.edu/~somlo/OSXKVM/&lt;br /&gt;
&lt;br /&gt;
      Plan: add support for active-low interrupts in KVM.&lt;br /&gt;
            Enable this for PCI interrupts.&lt;br /&gt;
            Change DSDT appropriately.&lt;br /&gt;
&lt;br /&gt;
      Developer: &lt;br /&gt;
      Testing: stress-test devices with INT#x interrupts&lt;br /&gt;
      with interrupt sharing in particular&lt;br /&gt;
&lt;br /&gt;
* PCI master-abort is not emulated correctly&lt;br /&gt;
      Issue: access to disabled PCI memory normally returns&lt;br /&gt;
      all-ones (or read) and sets master abort&lt;br /&gt;
      detected bit in bridge.&lt;br /&gt;
      For express, it can also trigger AER reporting&lt;br /&gt;
      when enabled.&lt;br /&gt;
      QEMU does not emulate any of this: reads return 0,&lt;br /&gt;
      writes.&lt;br /&gt;
      Plan: add catch-all memory region with low priority&lt;br /&gt;
      in bridge, and trigger the required actions.&lt;br /&gt;
&lt;br /&gt;
* Better modeling for PCI INT#x&lt;br /&gt;
      Issue: for a device deep down a bridge hierarchy,&lt;br /&gt;
      we scan the tree upwards on each interrupt,&lt;br /&gt;
      calling map_irq at each level, this is bad for performance.&lt;br /&gt;
      Behaviour is also open-coded at each level, this is ugly.&lt;br /&gt;
      Plan: something similar to MemoryRegion API:&lt;br /&gt;
      add objects that represent PCI INT#x pings&lt;br /&gt;
      (maybe pins in general) model their connection at&lt;br /&gt;
      each level. Each time there&#039;s a change, re-map&lt;br /&gt;
      them. On data path, use pre-computed irq# to&lt;br /&gt;
      send/clear the interrupt quickly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear ===&lt;br /&gt;
* Way to figure out proper PCI connectivity options.&lt;br /&gt;
      Issue: How do you know where you can connect a device?&lt;br /&gt;
      For PCI, this includes the legal bus addresses,&lt;br /&gt;
      hotplug support for bus,&lt;br /&gt;
      how the secondary bus is named,&lt;br /&gt;
      and whether bridges support required addressing modes.&lt;br /&gt;
      For PCI Express, there are additional options:&lt;br /&gt;
      root or downstream port,&lt;br /&gt;
      virtual bridge in root complex/upstream port.&lt;br /&gt;
      management tools end up hard-coding this information,&lt;br /&gt;
      based simply on device name, but that&#039;s ugly.&lt;br /&gt;
      Vague idea: add interfaces to figure out what can be&lt;br /&gt;
      connected to what and how, or at least the function of each device.&lt;br /&gt;
      People to contact: Laine Stump&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Fix AHCI for stability&lt;br /&gt;
      Not related to PCI directly but modern chipsets&lt;br /&gt;
      with PCI Express support all use AHCI.&lt;br /&gt;
      Issue1: AHCI is unstable with windows guests&lt;br /&gt;
       (win7 fails to boot sometimes)&lt;br /&gt;
      Issue2:  guests sometimes crash when doing ping pong migration&lt;br /&gt;
      People to contact: Alexander Graf&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=TODO&amp;diff=4863</id>
		<title>TODO</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=TODO&amp;diff=4863"/>
		<updated>2013-08-21T10:43:19Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=ToDo=&lt;br /&gt;
&lt;br /&gt;
The following items need some love. Please post to the list if you are interested in helping out: &lt;br /&gt;
&lt;br /&gt;
* Emulate MSR_IA32_DEBUGCTL for guests which use it&lt;br /&gt;
* Bring up Windows 95 and Windows 98 guests&lt;br /&gt;
* Implement ACPI memory hotplug&lt;br /&gt;
* Improve ballooning to try to use 2MB pages when possible ( in progress - kern.devel@gmail.com )&lt;br /&gt;
&lt;br /&gt;
==== Networking TODO: ====&lt;br /&gt;
* Has its [[NetworkingTodo|own page]]&lt;br /&gt;
&lt;br /&gt;
==== PCI TODO: ====&lt;br /&gt;
* Has its [[PCITodo|own page]]&lt;br /&gt;
&lt;br /&gt;
==== MMU related: ====&lt;br /&gt;
* Improve mmu page eviction algorithm (currently FIFO, change to approximate LRU).&lt;br /&gt;
* Add a read-only memory type.&lt;br /&gt;
** possible using mprotect()?&lt;br /&gt;
* Implement AM20 for dos and the like.&lt;br /&gt;
* O(1) write protection by protecting the PML4Es, then on demand PDPTEs, PDEs, and PTEs&lt;br /&gt;
* Simpler variant: don&#039;t drop large ptes when write protecting; just write protect them. When taking a write fault, either drop the large pte, or convert it to small ptes and write protect those (like O(1) write protection).&lt;br /&gt;
* O(1) mmu invalidation using a generation number&lt;br /&gt;
&lt;br /&gt;
==== x86 emulator updates: ====&lt;br /&gt;
* On-demand register access, really, copying all registers all the time is gross.&lt;br /&gt;
** Can be done by adding &#039;available&#039; and &#039;dirty&#039; bitmasks&lt;br /&gt;
* Implement mmx and sse memory move instructions; useful for guests that use multimedia extensions for accessing vga (partially done)&lt;br /&gt;
* Implement an operation queue for the emulator.  The emulator often calls userspace to perform a read or a write, but due to inversion of control it actually restarts instead of continuing.  The queue would allow it to replay all previous operations until it reaches the point it last stopped.&lt;br /&gt;
** if this is done, we can retire -&amp;gt;read_std() in favour of -&amp;gt;read_emulated().&lt;br /&gt;
* convert more instructions to direct dispatch (function pointer in decode table)&lt;br /&gt;
* move init_emulate_ctxt() into x86_decode_insn() and other emulator entry points&lt;br /&gt;
&lt;br /&gt;
==== Interactivity improvements: ====&lt;br /&gt;
* If for several frames in a row a large proportion of the framebuffer pages are changing, then for the next few frames don&#039;t bother to get the dirty page log from kvm, but instead assume that all pages are dirty.  This will reduce page fault overhead on highly interactive workloads.&lt;br /&gt;
* When detecting keyboard/video/mouse activity, scale up the frame rate; when activity dies down, scale it back down (applicable to qemu as well).&lt;br /&gt;
&lt;br /&gt;
==== Pass-through/VT-d related: ====&lt;br /&gt;
* Enhance KVM QEMU to return error messages if user attempts to pass-through unsupported devices:&lt;br /&gt;
** Devices with shared host IOAPIC interrupt&lt;br /&gt;
** Conventional PCI devices&lt;br /&gt;
** Devices without FLR capability&lt;br /&gt;
* QEMU PCI pass-through patch needs to be enhanced to same functionality as corresponding file in Xen&lt;br /&gt;
** Remove direct HW access by QEMU for probing PCI BAR size&lt;br /&gt;
** PCI handling of various PCI configuration registers&lt;br /&gt;
** Other enhancements that was done in Xen&lt;br /&gt;
* Host shared interrupt support&lt;br /&gt;
* VT-d2 support (WIP in Linux Kernel)&lt;br /&gt;
** Queued invalidation&lt;br /&gt;
** Interrupt remapping&lt;br /&gt;
** ATS&lt;br /&gt;
* USB 2.0 (EHCI) support&lt;br /&gt;
&lt;br /&gt;
==== Bug fixes: ====&lt;br /&gt;
* Less sexy but ever important, fixing bugs is one of the most important contributions&lt;br /&gt;
&lt;br /&gt;
==== Random improvements ====&lt;br /&gt;
* Utilize the SVM interrupt queue to avoid extra exits when guest interrupts are disabled&lt;br /&gt;
&lt;br /&gt;
==== For the adventurous: ====&lt;br /&gt;
* Emulate the VMX instruction sets on qemu.  This would be very beneficial to debugging kvm ( working on this - kern.devel@gmail.com ).&lt;br /&gt;
* Add [http://lagarcavilla.org/vmgl/ vmgl] support to qemu.  Port to virtio.  Write a Windows driver.&lt;br /&gt;
* Keep this TODO up to date&lt;br /&gt;
&lt;br /&gt;
==== Nested VMX ====&lt;br /&gt;
* Implement performance features such as EPT and VPID&lt;br /&gt;
&lt;br /&gt;
== KVM Safe Mode ==&lt;br /&gt;
&lt;br /&gt;
An ioctl() from userspace that tells KVM to disable one or more of the following features:&lt;br /&gt;
&lt;br /&gt;
* shadow paging (force direct mapping)&lt;br /&gt;
* instruction emulation (require virtio or mmio hypercall)&lt;br /&gt;
* task switches&lt;br /&gt;
* mode switches (long mode / legacy mode / real mode)&lt;br /&gt;
* IDT/GDT/LDT changes&lt;br /&gt;
* IDT/GDT/LDT write protect&lt;br /&gt;
* write protect important MSRs (*STAR etc)&lt;br /&gt;
&lt;br /&gt;
The idea is both to protect the guest from attacks, and to protect the host from the guest.&lt;br /&gt;
&lt;br /&gt;
__NOTOC__&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=PCITodo&amp;diff=4862</id>
		<title>PCITodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=PCITodo&amp;diff=4862"/>
		<updated>2013-08-21T10:38:55Z</updated>

		<summary type="html">&lt;p&gt;Mst: add PCI TODO&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all PCI related activity in KVM.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome!&lt;br /&gt;
&lt;br /&gt;
* virtio device as PCI Express device&lt;br /&gt;
  Issue: Express spec requires device can work without IO,&lt;br /&gt;
  virtio requires IO at the moment.&lt;br /&gt;
  Plan: add support for memory BARs.&lt;br /&gt;
  Developer: Michael S. Tsirkin&lt;br /&gt;
&lt;br /&gt;
* Hotplug for devices behind PCI bridges&lt;br /&gt;
  Issue: QEMU lacks support for device hotplug behind&lt;br /&gt;
  pci bridges.&lt;br /&gt;
&lt;br /&gt;
   Plan:&lt;br /&gt;
    - each bus gets assigned a number 0-255&lt;br /&gt;
    - generated ACPI code writes this number&lt;br /&gt;
      to a new BSEL register, then uses existing&lt;br /&gt;
      UP/DOWN registers to probe slot status;&lt;br /&gt;
      to eject, write number to BSEL register,&lt;br /&gt;
      then slot into existing EJ&lt;br /&gt;
    This is to address the ACPI spec requirement to&lt;br /&gt;
    avoid config cycle access to any bus except PCI roots.&lt;br /&gt;
&lt;br /&gt;
    Note: ACPI doesn&#039;t support adding or removing bridges by hotplug.&lt;br /&gt;
    We should and prevent removal of bridges by hotplug,&lt;br /&gt;
    unless they were added by hotplug previously&lt;br /&gt;
    (and so, are not described by ACPI).&lt;br /&gt;
  Developer: Michael S. Tsirkin&lt;br /&gt;
&lt;br /&gt;
* Hotplug for Q35&lt;br /&gt;
   Issue: QEMU does not support hotplug for Q35&lt;br /&gt;
   Plan: since we need to support hotplug of PCI devices,&lt;br /&gt;
   let&#039;s use ACPI hotplug for everything&lt;br /&gt;
   Use same interface as we do for PCI, this way&lt;br /&gt;
   same ACPI code can be reused.&lt;br /&gt;
&lt;br /&gt;
   Developer: Michael S. Tsirkin&lt;br /&gt;
&lt;br /&gt;
* Support for different PCI express link width/speed settings&lt;br /&gt;
      Issue: QEMU currently emulates all links at minimal&lt;br /&gt;
      width and speed. This means we don&#039;t need to emulate&lt;br /&gt;
      link negotiation, but might in theory confuse guests&lt;br /&gt;
      for assigned devices.&lt;br /&gt;
      The issue is complicated by the fact that real link speed&lt;br /&gt;
      might be limited by the slot where assigned device is put.&lt;br /&gt;
      Plan: add management interface to control the max link&lt;br /&gt;
      speed and width for the slot.&lt;br /&gt;
      Teach management to query this at slot level.&lt;br /&gt;
      For device, query it from device itself.&lt;br /&gt;
      Support link width/speed negotiation  as per spec.&lt;br /&gt;
   Developer: Alex Williamson&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
* PCI interrupts should be active-low&lt;br /&gt;
      Issue: PCI INT#x interrupts are normally active-low.&lt;br /&gt;
      QEMU emulates them as active high. Works fine for&lt;br /&gt;
      windows and linux but not guaranteed for other guests.       &lt;br /&gt;
      See http://www.contrib.andrew.cmu.edu/~somlo/OSXKVM/&lt;br /&gt;
&lt;br /&gt;
      Plan: add support for active-low interrupts in KVM.&lt;br /&gt;
            Enable this for PCI interrupts.&lt;br /&gt;
            Change DSDT appropriately.&lt;br /&gt;
&lt;br /&gt;
      Developer: &lt;br /&gt;
      Testing: stress-test devices with INT#x interrupts&lt;br /&gt;
      with interrupt sharing in particular&lt;br /&gt;
&lt;br /&gt;
* PCI master-abort is not emulated correctly&lt;br /&gt;
      Issue: access to disabled PCI memory normally returns&lt;br /&gt;
      all-ones (or read) and sets master abort&lt;br /&gt;
      detected bit in bridge.&lt;br /&gt;
      For express, it can also trigger AER reporting&lt;br /&gt;
      when enabled.&lt;br /&gt;
      QEMU does not emulate any of this: reads return 0,&lt;br /&gt;
      writes.&lt;br /&gt;
      Plan: add catch-all memory region with low priority&lt;br /&gt;
      in bridge, and trigger the required actions.&lt;br /&gt;
&lt;br /&gt;
* Better modeling for PCI INT#x&lt;br /&gt;
      Issue: for a device deep down a bridge hierarchy,&lt;br /&gt;
      we scan the tree upwards on each interrupt,&lt;br /&gt;
      calling map_irq at each level, this is bad for performance.&lt;br /&gt;
      Behaviour is also open-coded at each level, this is ugly.&lt;br /&gt;
      Plan: something similar to MemoryRegion API:&lt;br /&gt;
      add objects that represent PCI INT#x pings&lt;br /&gt;
      (maybe pins in general) model their connection at&lt;br /&gt;
      each level. Each time there&#039;s a change, re-map&lt;br /&gt;
      them. On data path, use pre-computed irq# to&lt;br /&gt;
      send/clear the interrupt quickly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear&lt;br /&gt;
* Way to figure out proper PCI connectivity options.&lt;br /&gt;
      Issue: How do you know where you can connect a device?&lt;br /&gt;
      For PCI, this includes the legal bus addresses,&lt;br /&gt;
      hotplug support for bus,&lt;br /&gt;
      how the secondary bus is named,&lt;br /&gt;
      and whether bridges support required addressing modes.&lt;br /&gt;
      For PCI Express, there are additional options:&lt;br /&gt;
      root or downstream port,&lt;br /&gt;
      virtual bridge in root complex/upstream port.&lt;br /&gt;
      management tools end up hard-coding this information,&lt;br /&gt;
      based simply on device name, but that&#039;s ugly.&lt;br /&gt;
      Vague idea: add interfaces to figure out what can be&lt;br /&gt;
      connected to what and how, or at least the function of each device.&lt;br /&gt;
      People to contact: Laine Stump&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Fix AHCI for stability&lt;br /&gt;
      Not related to PCI directly but modern chipsets&lt;br /&gt;
      with PCI Express support all use AHCI.&lt;br /&gt;
      Issue1: AHCI is unstable with windows guests&lt;br /&gt;
       (win7 fails to boot sometimes)&lt;br /&gt;
      Issue2:  guests sometimes crash when doing ping pong migration&lt;br /&gt;
      People to contact: Alexander Graf&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4847</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4847"/>
		<updated>2013-07-22T13:53:44Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome!&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* multiqueue support in macvtap&lt;br /&gt;
       multiqueue is only supported for tun.&lt;br /&gt;
       Add support for macvtap.&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default.&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
* eliminate the extra copy in virtio-net driver&lt;br /&gt;
       We need do an extra copy of 128 bytes for every packets. &lt;br /&gt;
       This could be eliminated for small packets by:&lt;br /&gt;
       1) use build_skb() and head frag&lt;br /&gt;
       2) bigger vnet header length ( &amp;gt;= NET_SKB_PAD + NET_IP_ALIGN )&lt;br /&gt;
       Or use a dedicated queue for small packet receiving ? (reordering)&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* make pktgen works for virtio-net ( or partially orphan )&lt;br /&gt;
       virtio-net orphan the skb during tx,&lt;br /&gt;
       which will makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: introduce a flat to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Add HW_VLAN_TX support for tap&lt;br /&gt;
       Eliminate the extra data moving for tagged packets&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Announce self by guest driver&lt;br /&gt;
       Send gARP by guest driver. Guest part is finished.&lt;br /&gt;
       Qemu is ongoing.&lt;br /&gt;
       V7 patches is here:&lt;br /&gt;
       http://lists.nongnu.org/archive/html/qemu-devel/2013-03/msg01127.html&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Amos Kong&lt;br /&gt;
        qemu: https://bugzilla.redhat.com/show_bug.cgi?id=848203&lt;br /&gt;
        libvirt: https://bugzilla.redhat.com/show_bug.cgi?id=848199&lt;br /&gt;
        https://git.kernel.org/cgit/virt/kvm/mst/qemu.git/patch/?id=1c0fa6b709d02fe4f98d4ce7b55a6cc3c925791c&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  https://patchwork.kernel.org/patch/1540471/&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
* Bug: e1000 &amp;amp; rtl8139: Change macaddr in guest, but not update to qemu (info network)&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
  https://bugzilla.redhat.com/show_bug.cgi?id=922589&lt;br /&gt;
&lt;br /&gt;
* Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
* sharing config interrupts&lt;br /&gt;
  Support mode devices by sharing a single msi vector&lt;br /&gt;
  between multiple virtio devices.&lt;br /&gt;
  (Applies to virtio-blk too).&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - proposed low latency net polling&lt;br /&gt;
  See http://lkml.indiana.edu/hypermail/linux/kernel/1303.0/00553.html&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
* non-virtio device support with vhost&lt;br /&gt;
  Use vhost interface for guests that don&#039;t use virtio-net&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        kernel part is done (Vlad Yasevich)&lt;br /&gt;
        teach qemu to notify libvirt to enable the filter (still to do) (existed NIC_RX_FILTER_CHANGED event contains vlan-tables)&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
        Reduce the number of interrupt&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* virtio: preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
* vxlan&lt;br /&gt;
  What could we do here?&lt;br /&gt;
&lt;br /&gt;
* bridging without promisc mode with OVS&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Write some unit tests for vhost-net/vhost-scsi&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4846</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4846"/>
		<updated>2013-07-22T13:51:55Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome!&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* multiqueue support in macvtap&lt;br /&gt;
       multiqueue is only supported for tun.&lt;br /&gt;
       Add support for macvtap.&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default.&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
* eliminate the extra copy in virtio-net driver&lt;br /&gt;
       We need do an extra copy of 128 bytes for every packets. &lt;br /&gt;
       This could be eliminated for small packets by:&lt;br /&gt;
       1) use build_skb() and head frag&lt;br /&gt;
       2) bigger vnet header length ( &amp;gt;= NET_SKB_PAD + NET_IP_ALIGN )&lt;br /&gt;
       Or use a dedicated queue for small packet receiving ? (reordering)&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* make pktgen works for virtio-net ( or partially orphan )&lt;br /&gt;
       virtio-net orphan the skb during tx,&lt;br /&gt;
       which will makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: introduce a flat to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Add HW_VLAN_TX support for tap&lt;br /&gt;
       Eliminate the extra data moving for tagged packets&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Announce self by guest driver&lt;br /&gt;
       Send gARP by guest driver. Guest part is finished.&lt;br /&gt;
       Qemu is ongoing.&lt;br /&gt;
       V7 patches is here:&lt;br /&gt;
       http://lists.nongnu.org/archive/html/qemu-devel/2013-03/msg01127.html&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Amos Kong&lt;br /&gt;
        qemu: https://bugzilla.redhat.com/show_bug.cgi?id=848203&lt;br /&gt;
        libvirt: https://bugzilla.redhat.com/show_bug.cgi?id=848199&lt;br /&gt;
        https://git.kernel.org/cgit/virt/kvm/mst/qemu.git/patch/?id=1c0fa6b709d02fe4f98d4ce7b55a6cc3c925791c&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  https://patchwork.kernel.org/patch/1540471/&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
* Bug: e1000 &amp;amp; rtl8139: Change macaddr in guest, but not update to qemu (info network)&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
  https://bugzilla.redhat.com/show_bug.cgi?id=922589&lt;br /&gt;
&lt;br /&gt;
* Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
* sharing config interrupts&lt;br /&gt;
  Support mode devices by sharing a single msi vector&lt;br /&gt;
  between multiple virtio devices.&lt;br /&gt;
  (Applies to virtio-blk too).&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - proposed low latency net polling&lt;br /&gt;
  See http://lkml.indiana.edu/hypermail/linux/kernel/1303.0/00553.html&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
* non-virtio device support with vhost&lt;br /&gt;
  Use vhost interface for guests that don&#039;t use virtio-net&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        kernel part is done (Vlad Yasevich)&lt;br /&gt;
        teach qemu to notify libvirt to enable the filter (still to do) (existed NIC_RX_FILTER_CHANGED event contains vlan-tables)&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
        Reduce the number of interrupt&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* virtio: preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
* vxlan&lt;br /&gt;
  What could we do here?&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Write some unit tests for vhost-net/vhost-scsi&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4844</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4844"/>
		<updated>2013-07-22T13:35:13Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome!&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* multiqueue support in macvtap&lt;br /&gt;
       multiqueue is only supported for tun.&lt;br /&gt;
       Add support for macvtap.&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default.&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
* eliminate the extra copy in virtio-net driver&lt;br /&gt;
       We need do an extra copy of 128 bytes for every packets. &lt;br /&gt;
       This could be eliminated for small packets by:&lt;br /&gt;
       1) use build_skb() and head frag&lt;br /&gt;
       2) bigger vnet header length ( &amp;gt;= NET_SKB_PAD + NET_IP_ALIGN )&lt;br /&gt;
       Or use a dedicated queue for small packet receiving ? (reordering)&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* make pktgen works for virtio-net ( or partially orphan )&lt;br /&gt;
       virtio-net orphan the skb during tx,&lt;br /&gt;
       which will makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: introduce a flat to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Add HW_VLAN_TX support for tap&lt;br /&gt;
       Eliminate the extra data moving for tagged packets&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Announce self by guest driver&lt;br /&gt;
       Send gARP by guest driver. Guest part is finished.&lt;br /&gt;
       Qemu is ongoing.&lt;br /&gt;
       V7 patches is here:&lt;br /&gt;
       http://lists.nongnu.org/archive/html/qemu-devel/2013-03/msg01127.html&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Amos Kong&lt;br /&gt;
        qemu: https://bugzilla.redhat.com/show_bug.cgi?id=848203&lt;br /&gt;
        libvirt: https://bugzilla.redhat.com/show_bug.cgi?id=848199&lt;br /&gt;
        https://git.kernel.org/cgit/virt/kvm/mst/qemu.git/patch/?id=1c0fa6b709d02fe4f98d4ce7b55a6cc3c925791c&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  https://patchwork.kernel.org/patch/1540471/&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
* Bug: e1000 &amp;amp; rtl8139: Change macaddr in guest, but not update to qemu (info network)&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
  https://bugzilla.redhat.com/show_bug.cgi?id=922589&lt;br /&gt;
&lt;br /&gt;
* Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
* sharing config interrupts&lt;br /&gt;
  Support mode devices by sharing a single msi vector&lt;br /&gt;
  between multiple virtio devices.&lt;br /&gt;
  (Applies to virtio-blk too).&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - proposed low latency net polling&lt;br /&gt;
  See http://lkml.indiana.edu/hypermail/linux/kernel/1303.0/00553.html&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
* non-virtio device support with vhost&lt;br /&gt;
  Use vhost interface for guests that don&#039;t use virtio-net&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        kernel part is done (Vlad Yasevich)&lt;br /&gt;
        teach qemu to notify libvirt to enable the filter (still to do)&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
        Reduce the number of interrupt&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
* vxlan&lt;br /&gt;
  What could we do here?&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Write some unit tests for vhost-net/vhost-scsi&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4835</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4835"/>
		<updated>2013-07-08T09:21:50Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome!&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* multiqueue support in macvtap&lt;br /&gt;
       multiqueue is only supported for tun.&lt;br /&gt;
       Add support for macvtap.&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default.&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
* eliminate the extra copy in virtio-net driver&lt;br /&gt;
       We need do an extra copy of 128 bytes for every packets. &lt;br /&gt;
       This could be eliminated for small packets by:&lt;br /&gt;
       1) use build_skb() and head frag&lt;br /&gt;
       2) bigger vnet header length ( &amp;gt;= NET_SKB_PAD + NET_IP_ALIGN )&lt;br /&gt;
       Or use a dedicated queue for small packet receiving ? (reordering)&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* make pktgen works for virtio-net ( or partially orphan )&lt;br /&gt;
       virtio-net orphan the skb during tx,&lt;br /&gt;
       which will makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: introduce a flat to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Add HW_VLAN_TX support for tap&lt;br /&gt;
       Eliminate the extra data moving for tagged packets&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Announce self by guest driver&lt;br /&gt;
       Send gARP by guest driver. Guest part is finished.&lt;br /&gt;
       Qemu is ongoing.&lt;br /&gt;
       V7 patches is here:&lt;br /&gt;
       http://lists.nongnu.org/archive/html/qemu-devel/2013-03/msg01127.html&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Amos Kong&lt;br /&gt;
        qemu: https://bugzilla.redhat.com/show_bug.cgi?id=848203&lt;br /&gt;
        libvirt: https://bugzilla.redhat.com/show_bug.cgi?id=848199&lt;br /&gt;
        https://git.kernel.org/cgit/virt/kvm/mst/qemu.git/patch/?id=1c0fa6b709d02fe4f98d4ce7b55a6cc3c925791c&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  https://patchwork.kernel.org/patch/1540471/&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
* Bug: e1000 &amp;amp; rtl8139: Change macaddr in guest, but not update to qemu (info network)&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
  https://bugzilla.redhat.com/show_bug.cgi?id=922589&lt;br /&gt;
&lt;br /&gt;
* Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
* sharing config interrupts&lt;br /&gt;
  Support mode devices by sharing a single msi vector&lt;br /&gt;
  between multiple virtio devices.&lt;br /&gt;
  (Applies to virtio-blk too).&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - proposed low latency net polling&lt;br /&gt;
  See http://lkml.indiana.edu/hypermail/linux/kernel/1303.0/00553.html&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        IGMP snooping in bridge should take vlans into account&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
        Reduce the number of interrupt&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
* vxlan&lt;br /&gt;
  What could we do here?&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Write some unit tests for vhost-net/vhost-scsi&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4833</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4833"/>
		<updated>2013-06-25T15:12:19Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome!&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* multiqueue support in macvtap&lt;br /&gt;
       multiqueue is only supported for tun.&lt;br /&gt;
       Add support for macvtap.&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default.&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
* eliminate the extra copy in virtio-net driver&lt;br /&gt;
       We need do an extra copy of 128 bytes for every packets. &lt;br /&gt;
       This could be eliminated for small packets by:&lt;br /&gt;
       1) use build_skb() and head frag&lt;br /&gt;
       2) bigger vnet header length ( &amp;gt;= NET_SKB_PAD + NET_IP_ALIGN )&lt;br /&gt;
       Or use a dedicated queue for small packet receiving ? (reordering)&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* make pktgen works for virtio-net ( or partially orphan )&lt;br /&gt;
       virtio-net orphan the skb during tx,&lt;br /&gt;
       which will makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: introduce a flat to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Add HW_VLAN_TX support for tap&lt;br /&gt;
       Eliminate the extra data moving for tagged packets&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Announce self by guest driver&lt;br /&gt;
       Send gARP by guest driver. Guest part is finished.&lt;br /&gt;
       Qemu is ongoing.&lt;br /&gt;
       V7 patches is here:&lt;br /&gt;
       http://lists.nongnu.org/archive/html/qemu-devel/2013-03/msg01127.html&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Amos Kong&lt;br /&gt;
        qemu: https://bugzilla.redhat.com/show_bug.cgi?id=848203&lt;br /&gt;
        libvirt: https://bugzilla.redhat.com/show_bug.cgi?id=848199&lt;br /&gt;
        https://git.kernel.org/cgit/virt/kvm/mst/qemu.git/patch/?id=1c0fa6b709d02fe4f98d4ce7b55a6cc3c925791c&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  https://patchwork.kernel.org/patch/1540471/&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
* Bug: e1000 &amp;amp; rtl8139: Change macaddr in guest, but not update to qemu (info network)&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
  https://bugzilla.redhat.com/show_bug.cgi?id=922589&lt;br /&gt;
&lt;br /&gt;
* Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - proposed low latency net polling&lt;br /&gt;
  See http://lkml.indiana.edu/hypermail/linux/kernel/1303.0/00553.html&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        IGMP snooping in bridge should take vlans into account&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
        Reduce the number of interrupt&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Write some unit tests for vhost-net/vhost-scsi&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4832</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4832"/>
		<updated>2013-06-24T13:58:46Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome!&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Bandan Das&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* multiqueue support in macvtap&lt;br /&gt;
       multiqueue is only supported for tun.&lt;br /&gt;
       Add support for macvtap.&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default.&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
* eliminate the extra copy in virtio-net driver&lt;br /&gt;
       We need do an extra copy of 128 bytes for every packets. &lt;br /&gt;
       This could be eliminated for small packets by:&lt;br /&gt;
       1) use build_skb() and head frag&lt;br /&gt;
       2) bigger vnet header length ( &amp;gt;= NET_SKB_PAD + NET_IP_ALIGN )&lt;br /&gt;
       Or use a dedicated queue for small packet receiving ? (reordering)&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* make pktgen works for virtio-net ( or partially orphan )&lt;br /&gt;
       virtio-net orphan the skb during tx,&lt;br /&gt;
       which will makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: introduce a flat to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Add HW_VLAN_TX support for tap&lt;br /&gt;
       Eliminate the extra data moving for tagged packets&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Announce self by guest driver&lt;br /&gt;
       Send gARP by guest driver. Guest part is finished.&lt;br /&gt;
       Qemu is ongoing.&lt;br /&gt;
       V7 patches is here:&lt;br /&gt;
       http://lists.nongnu.org/archive/html/qemu-devel/2013-03/msg01127.html&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Amos Kong&lt;br /&gt;
        qemu: https://bugzilla.redhat.com/show_bug.cgi?id=848203&lt;br /&gt;
        libvirt: https://bugzilla.redhat.com/show_bug.cgi?id=848199&lt;br /&gt;
        https://git.kernel.org/cgit/virt/kvm/mst/qemu.git/patch/?id=1c0fa6b709d02fe4f98d4ce7b55a6cc3c925791c&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  https://patchwork.kernel.org/patch/1540471/&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
* Bug: e1000 &amp;amp; rtl8139: Change macaddr in guest, but not update to qemu (info network)&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
  https://bugzilla.redhat.com/show_bug.cgi?id=922589&lt;br /&gt;
&lt;br /&gt;
* Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
  Developer: Dmitry Fleytman&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - proposed low latency net polling&lt;br /&gt;
  See http://lkml.indiana.edu/hypermail/linux/kernel/1303.0/00553.html&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        IGMP snooping in bridge should take vlans into account&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
        Reduce the number of interrupt&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4802</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4802"/>
		<updated>2013-06-10T07:06:47Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome!&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Shirley Ma?, MST?&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* multiqueue support in macvtap&lt;br /&gt;
       multiqueue is only supported for tun.&lt;br /&gt;
       Add support for macvtap.&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default.&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
* eliminate the extra copy in virtio-net driver&lt;br /&gt;
       We need do an extra copy of 128 bytes for every packets. &lt;br /&gt;
       This could be eliminated for small packets by:&lt;br /&gt;
       1) use build_skb() and head frag&lt;br /&gt;
       2) bigger vnet header length ( &amp;gt;= NET_SKB_PAD + NET_IP_ALIGN )&lt;br /&gt;
       Or use a dedicated queue for small packet receiving ? (reordering)&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* make pktgen works for virtio-net ( or partially orphan )&lt;br /&gt;
       virtio-net orphan the skb during tx,&lt;br /&gt;
       which will makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: introduce a flat to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Add HW_VLAN_TX support for tap&lt;br /&gt;
       Eliminate the extra data moving for tagged packets&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Announce self by guest driver&lt;br /&gt;
       Send gARP by guest driver. Guest part is finished.&lt;br /&gt;
       Qemu is ongoing.&lt;br /&gt;
       V7 patches is here:&lt;br /&gt;
       http://lists.nongnu.org/archive/html/qemu-devel/2013-03/msg01127.html&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Dragos Tatulea?, Amos Kong&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  https://patchwork.kernel.org/patch/1540471/&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
* Bug: e1000 &amp;amp; rtl8139: Change macaddr in guest, but not update to qemu (info network)&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
* Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
&lt;br /&gt;
* Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - proposed low latency net polling&lt;br /&gt;
  See http://lkml.indiana.edu/hypermail/linux/kernel/1303.0/00553.html&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        IGMP snooping in bridge should take vlans into account&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
        Reduce the number of interrupt&lt;br /&gt;
&lt;br /&gt;
* bridging on top of macvlan &lt;br /&gt;
  add code to forward LRO status from macvlan (not macvtap)&lt;br /&gt;
  back to the lowerdev, so that setting up forwarding&lt;br /&gt;
  from macvlan disables LRO on the lowerdev&lt;br /&gt;
&lt;br /&gt;
* preserve packets exactly with LRO&lt;br /&gt;
  LRO is not normally compatible with forwarding.&lt;br /&gt;
  virtio we are getting packets from a linux host,&lt;br /&gt;
  so we could thinkably preserve packets exactly&lt;br /&gt;
  even with LRO. I am guessing other hardware could be&lt;br /&gt;
  doing this as well.&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4801</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4801"/>
		<updated>2013-06-10T06:55:27Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome!&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Shirley Ma?, MST?&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* multiqueue support in macvtap&lt;br /&gt;
       multiqueue is only supported for tun.&lt;br /&gt;
       Add support for macvtap.&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default.&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
* eliminate the extra copy in virtio-net driver&lt;br /&gt;
       We need do an extra copy of 128 bytes for every packets. &lt;br /&gt;
       This could be eliminated for small packets by:&lt;br /&gt;
       1) use build_skb() and head frag&lt;br /&gt;
       2) bigger vnet header length ( &amp;gt;= NET_SKB_PAD + NET_IP_ALIGN )&lt;br /&gt;
       Or use a dedicated queue for small packet receiving ? (reordering)&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* make pktgen works for virtio-net ( or partially orphan )&lt;br /&gt;
       virtio-net orphan the skb during tx,&lt;br /&gt;
       which will makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: introduce a flat to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Add HW_VLAN_TX support for tap&lt;br /&gt;
       Eliminate the extra data moving for tagged packets&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Announce self by guest driver&lt;br /&gt;
       Send gARP by guest driver. Guest part is finished.&lt;br /&gt;
       Qemu is ongoing.&lt;br /&gt;
       V7 patches is here:&lt;br /&gt;
       http://lists.nongnu.org/archive/html/qemu-devel/2013-03/msg01127.html&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Dragos Tatulea?, Amos Kong&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  https://patchwork.kernel.org/patch/1540471/&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
* Bug: e1000 &amp;amp; rtl8139: Change macaddr in guest, but not update to qemu (info network)&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
* Enable GRO for packets coming to bridge from a tap interface&lt;br /&gt;
&lt;br /&gt;
* Better support for windows LRO&lt;br /&gt;
  Extend virtio-header with statistics for GRO packets:&lt;br /&gt;
  number of packets coalesced and number of duplicate ACKs coalesced&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - proposed low latency net polling&lt;br /&gt;
  See http://lkml.indiana.edu/hypermail/linux/kernel/1303.0/00553.html&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        IGMP snooping in bridge should take vlans into account&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
        Reduce the number of interrupt&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
* Measure the effect of each of the above-mentioned optimizations&lt;br /&gt;
  - Use autotest network performance regression testing (that runs netperf)&lt;br /&gt;
  - Also test any wild idea that works. Some may be useful.&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4787</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4787"/>
		<updated>2013-05-24T14:02:25Z</updated>

		<summary type="html">&lt;p&gt;Mst: another project with no owner&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome!&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Shirley Ma?, MST?&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* multiqueue support in macvtap&lt;br /&gt;
       multiqueue is only supported for tun.&lt;br /&gt;
       Add support for macvtap.&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default.&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
* eliminate the extra copy in virtio-net driver&lt;br /&gt;
       We need do an extra copy of 128 bytes for every packets. &lt;br /&gt;
       This could be eliminated for small packets by:&lt;br /&gt;
       1) use build_skb() and head frag&lt;br /&gt;
       2) bigger vnet header length ( &amp;gt;= NET_SKB_PAD + NET_IP_ALIGN )&lt;br /&gt;
       Or use a dedicated queue for small packet receiving ? (reordering)&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* make pktgen works for virtio-net ( or partially orphan )&lt;br /&gt;
       virtio-net orphan the skb during tx,&lt;br /&gt;
       which will makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: introduce a flat to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Add HW_VLAN_TX support for tap&lt;br /&gt;
       Eliminate the extra data moving for tagged packets&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Announce self by guest driver&lt;br /&gt;
       Send gARP by guest driver. Guest part is finished.&lt;br /&gt;
       Qemu is ongoing.&lt;br /&gt;
       V7 patches is here:&lt;br /&gt;
       http://lists.nongnu.org/archive/html/qemu-devel/2013-03/msg01127.html&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Dragos Tatulea?, Amos Kong&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  https://patchwork.kernel.org/patch/1540471/&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
* Bug: e1000 &amp;amp; rtl8139: Change macaddr in guest, but not update to qemu (info network)&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - proposed low latency net polling&lt;br /&gt;
  See http://lkml.indiana.edu/hypermail/linux/kernel/1303.0/00553.html&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
* ring aliasing:&lt;br /&gt;
  using vhost-net as a networking backend with virtio-net in QEMU&lt;br /&gt;
  being what&#039;s guest facing.&lt;br /&gt;
  This gives you the best of both worlds: QEMU acts as a first&lt;br /&gt;
  line of defense against a malicious guest while still getting the&lt;br /&gt;
  performance advantages of vhost-net (zero-copy).&lt;br /&gt;
  In fact a bit of complexity in vhost was put there in the vague hope to&lt;br /&gt;
  support something like this: virtio rings are not translated through&lt;br /&gt;
  regular memory tables, instead, vhost gets a pointer to ring address.&lt;br /&gt;
  This allows qemu acting as a man in the middle,&lt;br /&gt;
  verifying the descriptors but not touching the packet data.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        IGMP snooping in bridge should take vlans into account&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
        Reduce the number of interrupt&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4785</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4785"/>
		<updated>2013-05-24T11:06:27Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome!&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Shirley Ma?, MST?&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* multiqueue support in macvtap&lt;br /&gt;
       multiqueue is only supported for tun.&lt;br /&gt;
       Add support for macvtap.&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default.&lt;br /&gt;
       This is because GSO tends to batch less when mq is enabled.&lt;br /&gt;
       https://patchwork.kernel.org/patch/2235191/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* rework on flow caches&lt;br /&gt;
       Current hlist implementation of flow caches has several limitations:&lt;br /&gt;
       1) at worst case, linear search will be bad&lt;br /&gt;
       2) not scale&lt;br /&gt;
       https://patchwork.kernel.org/patch/2025121/&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
       &lt;br /&gt;
* eliminate the extra copy in virtio-net driver&lt;br /&gt;
       We need do an extra copy of 128 bytes for every packets. &lt;br /&gt;
       This could be eliminated for small packets by:&lt;br /&gt;
       1) use build_skb() and head frag&lt;br /&gt;
       2) bigger vnet header length ( &amp;gt;= NET_SKB_PAD + NET_IP_ALIGN )&lt;br /&gt;
       Or use a dedicated queue for small packet receiving ? (reordering)&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* make pktgen works for virtio-net ( or partially orphan )&lt;br /&gt;
       virtio-net orphan the skb during tx,&lt;br /&gt;
       which will makes pktgen wait for ever to the refcnt.&lt;br /&gt;
       Jason&#039;s idea: introduce a flat to tell pktgen not for wait&lt;br /&gt;
       Discussion here: https://patchwork.kernel.org/patch/1800711/&lt;br /&gt;
       MST&#039;s idea: add a .ndo_tx_polling not only for pktgen&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Add HW_VLAN_TX support for tap&lt;br /&gt;
       Eliminate the extra data moving for tagged packets&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* Announce self by guest driver&lt;br /&gt;
       Send gARP by guest driver. Guest part is finished.&lt;br /&gt;
       Qemu is ongoing.&lt;br /&gt;
       V7 patches is here:&lt;br /&gt;
       http://lists.nongnu.org/archive/html/qemu-devel/2013-03/msg01127.html&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Dragos Tatulea?, Amos Kong&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  https://patchwork.kernel.org/patch/1540471/&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
* Bug: e1000 &amp;amp; rtl8139: Change macaddr in guest, but not update to qemu (info network)&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  There are two kinds of netdev polling:&lt;br /&gt;
  - netpoll - used for debugging&lt;br /&gt;
  - proposed low latency net polling&lt;br /&gt;
  See http://lkml.indiana.edu/hypermail/linux/kernel/1303.0/00553.html&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
* more GSO type support:&lt;br /&gt;
       Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
     Jason has an draft patch to use flex array.&lt;br /&gt;
     Another thing is to move the flow caches out of tun_struct.&lt;br /&gt;
     Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        IGMP snooping in bridge should take vlans into account&lt;br /&gt;
&lt;br /&gt;
* tx coalescing&lt;br /&gt;
        Delay several packets before kick the device.&lt;br /&gt;
&lt;br /&gt;
* interrupt coalescing&lt;br /&gt;
        Reduce the number of interrupt&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4782</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4782"/>
		<updated>2013-05-23T21:52:46Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome!&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
&lt;br /&gt;
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument&lt;br /&gt;
&lt;br /&gt;
      Developer: Shirley Ma?, MST?&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* multiqueue support in macvtap&lt;br /&gt;
       multiqueue is only supported for tun.&lt;br /&gt;
       Add support for macvtap.&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Dragos Tatulea?, Amos Kong&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  https://patchwork.kernel.org/patch/1540471/&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
* Bug: e1000 &amp;amp; rtl8139: Change macaddr in guest, but not update to qemu (info network)&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  See http://lkml.indiana.edu/hypermail/linux/kernel/1303.0/00553.html&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        IGMP snooping in bridge should take vlans into account&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4781</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4781"/>
		<updated>2013-05-23T21:44:27Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome!&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
      Developer: Shirley Ma?, MST&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* multiqueue support in macvtap&lt;br /&gt;
       multiqueue is only supported for tun.&lt;br /&gt;
       Add support for macvtap.&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Dragos Tatulea?, Amos Kong&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  https://patchwork.kernel.org/patch/1540471/&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
* Bug: e1000 &amp;amp; rtl8139: Change macaddr in guest, but not update to qemu (info network)&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  See http://lkml.indiana.edu/hypermail/linux/kernel/1303.0/00553.html&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
  Old patch here: [PATCH RFC] tun: dma engine support&lt;br /&gt;
  It does not speed things up. Need to see why and&lt;br /&gt;
  what can be done.&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        IGMP snooping in bridge should take vlans into account&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4780</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4780"/>
		<updated>2013-05-23T21:42:30Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome!&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
      Developer: Shirley Ma?, MST&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* multiqueue support in macvtap&lt;br /&gt;
       multiqueue is only supported for tun.&lt;br /&gt;
       Add support for macvtap.&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Dragos Tatulea?, Amos Kong&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  https://patchwork.kernel.org/patch/1540471/&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
* Bug: e1000 &amp;amp; rtl8139: Change macaddr in guest, but not update to qemu (info network)&lt;br /&gt;
  Developer: Amos Kong&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  See http://lkml.indiana.edu/hypermail/linux/kernel/1303.0/00553.html&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues, but we really want&lt;br /&gt;
     1 queue per guest CPU. The limit comes from net&lt;br /&gt;
     core, need to teach it to allocate array of&lt;br /&gt;
     pointers and not array of queues.&lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        IGMP snooping in bridge should take vlans into account&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4774</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4774"/>
		<updated>2013-05-23T10:43:24Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome!&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      www.mail-archive.com/kvm@vger.kernel.org/msg69868.html&lt;br /&gt;
      Developer: Shirley Ma?, MST&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* multiqueue support in macvtap&lt;br /&gt;
       multiqueue is only supported for tun.&lt;br /&gt;
       Add support for macvtap.&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Dragos Tatulea?, Amos Kong&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  https://patchwork.kernel.org/patch/1540471/&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  See http://lkml.indiana.edu/hypermail/linux/kernel/1303.0/00553.html&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues &lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        IGMP snooping in bridge should take vlans into account&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4773</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4773"/>
		<updated>2013-05-23T10:41:36Z</updated>

		<summary type="html">&lt;p&gt;Mst: add links and more info. link to low latency net patches&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome!&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      Developer: Shirley Ma?, MST&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* multiqueue support in macvtap&lt;br /&gt;
       multiqueue is only supported for tun.&lt;br /&gt;
       Add support for macvtap.&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Dragos Tatulea?, Amos Kong&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  http://comments.gmane.org/gmane.linux.network/266546&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* reduce networking latency:&lt;br /&gt;
  allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Plan:&lt;br /&gt;
    We are going through the scheduler 3 times&lt;br /&gt;
    (could be up to 5 if softirqd is involved)&lt;br /&gt;
    Consider RX: host irq -&amp;gt; io thread -&amp;gt; VCPU thread -&amp;gt;&lt;br /&gt;
    guest irq -&amp;gt; guest thread.&lt;br /&gt;
    This adds a lot of latency.&lt;br /&gt;
    We can cut it by some 1.5x if we do a bit of work&lt;br /&gt;
    either in the VCPU or softirq context.&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  https://patchwork.kernel.org/patch/1540471/&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
* netdev polling for virtio.&lt;br /&gt;
  See http://lkml.indiana.edu/hypermail/linux/kernel/1303.0/00553.html&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues &lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        IGMP snooping in bridge should take vlans into account&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4772</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4772"/>
		<updated>2013-05-23T08:48:37Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
TODO: add bugzilla entry links.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome!&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      Developer: Shirley Ma?, MST&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* multiqueue support in macvtap&lt;br /&gt;
       multiqueue is only supported for tun.&lt;br /&gt;
       Add support for macvtap.&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Dragos Tatulea?, Amos Kong&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues &lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        IGMP snooping in bridge should take vlans into account&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4771</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4771"/>
		<updated>2013-05-23T08:47:25Z</updated>

		<summary type="html">&lt;p&gt;Mst: add Narasimhan, Sriram&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome!&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      Developer: Shirley Ma?, MST&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* multiqueue support in macvtap&lt;br /&gt;
       multiqueue is only supported for tun.&lt;br /&gt;
       Add support for macvtap.&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Dragos Tatulea?, Amos Kong&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
* Improve stats, make them more helpful for per analysis&lt;br /&gt;
  Developer: Sriram Narasimhan&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues &lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        IGMP snooping in bridge should take vlans into account&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4770</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=4770"/>
		<updated>2013-05-23T08:42:38Z</updated>

		<summary type="html">&lt;p&gt;Mst: rewrote the page. TODO: add BZs, detailed project descriptions.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
=== projects in progress. contributions are still very wellcome!&lt;br /&gt;
&lt;br /&gt;
* vhost-net scalability tuning: threading for many VMs&lt;br /&gt;
      Plan: switch to workqueue shared by many VMs&lt;br /&gt;
      Developer: Shirley Ma?, MST&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
* multiqueue support in macvtap&lt;br /&gt;
       multiqueue is only supported for tun.&lt;br /&gt;
       Add support for macvtap.&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* enable multiqueue by default&lt;br /&gt;
       Multiqueue causes regression in some workloads, thus&lt;br /&gt;
       it is off by default. Detect and enable/disable&lt;br /&gt;
       automatically so we can make it on by default&lt;br /&gt;
       Developer: Jason Wang&lt;br /&gt;
&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
        Developer: Dragos Tatulea?, Amos Kong&lt;br /&gt;
        Status: [[GuestProgrammableMacVlanFiltering]]&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
  Helps performance and security on noisy LANs&lt;br /&gt;
  Developer: Vlad Yasevich&lt;br /&gt;
&lt;br /&gt;
* allow handling short packets from softirq or VCPU context&lt;br /&gt;
  Testing: netperf TCP RR - should be improved drastically&lt;br /&gt;
           netperf TCP STREAM guest to host - no regression&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* Flexible buffers: put virtio header inline with packet data&lt;br /&gt;
  Developer: MST&lt;br /&gt;
&lt;br /&gt;
* device failover to allow migration with assigned devices&lt;br /&gt;
  https://fedoraproject.org/wiki/Features/Virt_Device_Failover&lt;br /&gt;
  Developer: Gal Hammer, Cole Robinson, Laine Stump, MST&lt;br /&gt;
&lt;br /&gt;
* Reuse vringh code for better maintainability&lt;br /&gt;
  Developer: Rusty Russell&lt;br /&gt;
&lt;br /&gt;
=== projects that are not started yet - no owner ===&lt;br /&gt;
&lt;br /&gt;
* receive side zero copy&lt;br /&gt;
  The ideal is a NIC with accelerated RFS support,&lt;br /&gt;
  So we can feed the virtio rx buffers into the correct NIC queue.&lt;br /&gt;
  Depends on non promisc NIC support in bridge.&lt;br /&gt;
&lt;br /&gt;
* IPoIB infiniband bridging&lt;br /&gt;
  Plan: implement macvtap for ipoib and virtio-ipoib&lt;br /&gt;
&lt;br /&gt;
* RDMA bridging&lt;br /&gt;
&lt;br /&gt;
* use kvm eventfd support for injecting level interrupts,&lt;br /&gt;
  enable vhost by default for level interrupts&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
&lt;br /&gt;
* virtio API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
&lt;br /&gt;
=== vague ideas: path to implementation not clear&lt;br /&gt;
&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* support more queues&lt;br /&gt;
     We limit TUN to 8 queues &lt;br /&gt;
&lt;br /&gt;
* irq/numa affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
* reduce conflict with VCPU thread&lt;br /&gt;
    if VCPU and networking run on same CPU,&lt;br /&gt;
    they conflict resulting in bad performance.&lt;br /&gt;
    Fix that, push vhost thread out to another CPU&lt;br /&gt;
    more aggressively.&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        IGMP snooping in bridge should take vlans into account&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
Keeping networking stable is highest priority.&lt;br /&gt;
&lt;br /&gt;
* Run weekly test on upstream HEAD covering test matrix with autotest&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
&lt;br /&gt;
=== test matrix ===&lt;br /&gt;
&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=KVM_Forum_2011&amp;diff=3646</id>
		<title>KVM Forum 2011</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=KVM_Forum_2011&amp;diff=3646"/>
		<updated>2011-06-27T10:28:50Z</updated>

		<summary type="html">&lt;p&gt;Mst: Redirect to KVM_Forum_2011_WIP&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;#REDIRECT [[KVM_Forum_2011_WIP]]&lt;br /&gt;
&lt;br /&gt;
= KVM Forum 2011: outdated page =&lt;br /&gt;
= Vancouver Canada, August 15-16, 2011 =&lt;br /&gt;
The KVM Forum 2011 will be held &lt;br /&gt;
at the Hyatt Regency Vancouver in Vancouver, Canada on August 15-16, 2011. We will be co-located with LinuxCon North America 2011&lt;br /&gt;
&lt;br /&gt;
http://events.linuxfoundation.org/events/linuxcon&lt;br /&gt;
&lt;br /&gt;
== Scope ==&lt;br /&gt;
KVM is an industry leading open source hypervisor that provides an ideal&lt;br /&gt;
platform for datacenter virtualization, virtual desktop infrastructure,&lt;br /&gt;
and cloud computing.  Once again, it&#039;s time to bring together the&lt;br /&gt;
community of developers and users that define the KVM ecosystem for&lt;br /&gt;
our annual technical conference.  We will discuss the current state of&lt;br /&gt;
affairs and plan for the future of KVM, its surrounding infrastructure,&lt;br /&gt;
and management tools.  So mark your calendar and join us in advancing KVM.&lt;br /&gt;
&lt;br /&gt;
http://events.linuxfoundation.org/events/kvm-forum/&lt;br /&gt;
&lt;br /&gt;
== CFP ==&lt;br /&gt;
[[KVMForum2011CFP|KVM Forum 2011 CFP]] (now closed, see [[#Schedule|Schedule]])&lt;br /&gt;
&lt;br /&gt;
== Registration ==&lt;br /&gt;
&lt;br /&gt;
Please visit this page to register:&lt;br /&gt;
&lt;br /&gt;
http://events.linuxfoundation.org/events/kvm-forum/register&lt;br /&gt;
&lt;br /&gt;
== Hotel and Travel ==&lt;br /&gt;
The KVM Forum 2011 will be held in Vancouver BC at the Hyatt Regency Vancouver.&lt;br /&gt;
See the Linux Foundation&#039;s KVM Forum page for more details on hotels and travel.&lt;br /&gt;
&lt;br /&gt;
http://events.linuxfoundation.org/events/kvm-forum/travel&lt;br /&gt;
&lt;br /&gt;
== Schedule ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Monday, August 15th&#039;&#039;&#039;&lt;br /&gt;
{|&lt;br /&gt;
! Time !! Title !! Speaker &lt;br /&gt;
|-&lt;br /&gt;
|09:00 - 09:15 || colspan=&amp;quot;2&amp;quot; align=&amp;quot;center&amp;quot;| Welcome&lt;br /&gt;
|-&lt;br /&gt;
|09:15 - 09:30 || Keynote || &lt;br /&gt;
|-&lt;br /&gt;
|09:30 - 10:00 ||  || &lt;br /&gt;
|-&lt;br /&gt;
|10:00 - 10:30 ||  || &lt;br /&gt;
|-&lt;br /&gt;
| 10:30 - 10:45  || colspan=&amp;quot;2&amp;quot; align=&amp;quot;center&amp;quot;| Break&lt;br /&gt;
|-&lt;br /&gt;
| 10:45 - 11:15 ||  || &lt;br /&gt;
|-&lt;br /&gt;
| 11:15 - 11:45 ||  || &lt;br /&gt;
|-&lt;br /&gt;
| 11:45 - 12:15 ||  || &lt;br /&gt;
|-&lt;br /&gt;
| 12:15 - 13:30 || colspan=&amp;quot;2&amp;quot; align=&amp;quot;center&amp;quot;| Lunch&lt;br /&gt;
|}&lt;br /&gt;
{|&lt;br /&gt;
! !! colspan=&amp;quot;2&amp;quot;|Track 1 !! colspan=&amp;quot;2&amp;quot;|Track 2&lt;br /&gt;
|-&lt;br /&gt;
! Time !! Title !! Speaker !! Title !! Speaker&lt;br /&gt;
|-&lt;br /&gt;
| 13:30 - 14:00 || || ||  || &lt;br /&gt;
|-&lt;br /&gt;
| 14:00 - 14:30 ||  ||  ||  || &lt;br /&gt;
|-&lt;br /&gt;
| 14:30 - 15:00 || ||  ||  || &lt;br /&gt;
|-&lt;br /&gt;
| 15:00 - 15:20 || colspan=&amp;quot;4&amp;quot; align=&amp;quot;center&amp;quot;|Break&lt;br /&gt;
|-&lt;br /&gt;
|15:20 - 15:50 ||  || ||  || &lt;br /&gt;
|-&lt;br /&gt;
|15:50 - 16:20 ||  ||  ||  || &lt;br /&gt;
|-&lt;br /&gt;
|16:20 - 16:50 ||  ||  ||  || &lt;br /&gt;
|-&lt;br /&gt;
|16:50 - 17:10 || colspan=&amp;quot;4&amp;quot; align=&amp;quot;center&amp;quot;|Break&lt;br /&gt;
|-&lt;br /&gt;
|17:10 - 19:00 || colspan=&amp;quot;4&amp;quot; align=&amp;quot;center&amp;quot;|BoFs&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Tuesday, August 16th&#039;&#039;&#039;&lt;br /&gt;
{|&lt;br /&gt;
! Time !! Title !! Speaker&lt;br /&gt;
|-&lt;br /&gt;
| 9:00 - 9:15 || Keynote || &lt;br /&gt;
|-&lt;br /&gt;
| 9:15 - 9:45 ||  || &lt;br /&gt;
|-&lt;br /&gt;
| 9:45 - 10:15 ||  || &lt;br /&gt;
|-&lt;br /&gt;
| 10:15 - 10:45 ||  || &lt;br /&gt;
|-&lt;br /&gt;
| 10:45 - 11:00 || colspan=&amp;quot;2&amp;quot; align=&amp;quot;center&amp;quot; | Break&lt;br /&gt;
|-&lt;br /&gt;
| 11:00 - 11:30 ||  || &lt;br /&gt;
|-&lt;br /&gt;
| 11:30 - 12:00 ||  || &lt;br /&gt;
|-&lt;br /&gt;
| 12:00 - 12:30 ||  || &lt;br /&gt;
|-&lt;br /&gt;
| 12:30 - 13:45 || colspan=&amp;quot;2&amp;quot; align=&amp;quot;center&amp;quot; | Lunch&lt;br /&gt;
|}&lt;br /&gt;
{|&lt;br /&gt;
! !! colspan=&amp;quot;2&amp;quot;|Track 1 !! colspan=&amp;quot;2&amp;quot;|Track 2&lt;br /&gt;
|-&lt;br /&gt;
! Time !! Title !! Speaker !! Title !! Speaker&lt;br /&gt;
|-&lt;br /&gt;
| 13:45 - 14:15 ||  ||  || ||&lt;br /&gt;
|-&lt;br /&gt;
| 14:15 - 14:45 ||  ||  ||  || &lt;br /&gt;
|-&lt;br /&gt;
| 14:45 - 15:15 ||  ||  ||  || &lt;br /&gt;
|-&lt;br /&gt;
| 15:15 - 15:30 || colspan=&amp;quot;4&amp;quot; align=&amp;quot;center&amp;quot;|Break&lt;br /&gt;
|-&lt;br /&gt;
| 15:30 - 16:00 ||  ||  ||  || &lt;br /&gt;
|-&lt;br /&gt;
| 16:00 - 16:30 ||  ||  ||  || &lt;br /&gt;
|-&lt;br /&gt;
| 16:30 - 17:00 ||  || ||  || &lt;br /&gt;
|-&lt;br /&gt;
| 17:00 - 17:30 ||  ||  ||  || &lt;br /&gt;
|-&lt;br /&gt;
| 17:15 - 17:30 || colspan=&amp;quot;4&amp;quot; align=&amp;quot;center&amp;quot;|Closing&lt;br /&gt;
|-&lt;br /&gt;
| 17:30 - 19:00 || colspan=&amp;quot;4&amp;quot; align=&amp;quot;center&amp;quot;|BoFs&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=File:Apic-wiring-mess.odp&amp;diff=3645</id>
		<title>File:Apic-wiring-mess.odp</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=File:Apic-wiring-mess.odp&amp;diff=3645"/>
		<updated>2011-06-26T08:23:56Z</updated>

		<summary type="html">&lt;p&gt;Mst: test&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;test&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingPerformanceTesting&amp;diff=3454</id>
		<title>NetworkingPerformanceTesting</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingPerformanceTesting&amp;diff=3454"/>
		<updated>2010-12-15T19:27:04Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Networking Performance Testing ==&lt;br /&gt;
This is a summary of performance acceptance criteria for changes in hypervisor virt networking. The matrix of configurations we are interested in is built combining possible options. Naturally the bigger a change the more exhaustive would we want the coverage to be.&lt;br /&gt;
&lt;br /&gt;
We can get different configurations by selecting different options in the following categories: [[#Networking setup|Networking setup]], [[#CPU setup|CPU setup]], [[#Guest setup|Guest setup]], [[#Traffic load|Traffic load]].&lt;br /&gt;
For each of these we are interested in a set of [[#Performance metrics|Performance metrics]].&lt;br /&gt;
A test would need to be performed under a controlled Hardware configuration,&lt;br /&gt;
for each relevant [[#Hypervisor setup|Hypervisor setup]] and/or [[#Guest setup|Guest setup]] (depending on which change is tested) on the same hardware.&lt;br /&gt;
Ideally we&#039;d note the [[#Hardware configuration|Hardware configuration]] and person performing the test to increase the chance it can be reproduced later.&lt;br /&gt;
&lt;br /&gt;
== Performance metrics ==&lt;br /&gt;
Generally for a given setup and traffic&lt;br /&gt;
we want to know the [[#Latency|Latency]] and the [[#CPU load|CPU load]].&lt;br /&gt;
We generally might care about minimal, average (or median) and maximum&lt;br /&gt;
latencies.&lt;br /&gt;
&lt;br /&gt;
=== Latency ===&lt;br /&gt;
Latency is generally time until you get a response. For some workloads you don&#039;t measure latencies directly, instead you measure peak throughput.&lt;br /&gt;
&lt;br /&gt;
=== CPU load ===&lt;br /&gt;
The only metric that makes sense is probably host system load,&lt;br /&gt;
of which the only someone quantifiable component seems to be the CPU load.&lt;br /&gt;
Need take into account the fact that CPU speed might change&lt;br /&gt;
with time, so load should probably be in seconds&lt;br /&gt;
(%CPU/speed) rather than plain %CPU.&lt;br /&gt;
&lt;br /&gt;
Some derive metrics from this are:&lt;br /&gt;
==== peak throughput ====&lt;br /&gt;
How high we can load the system&lt;br /&gt;
until latencies sharply become unreasonable&lt;br /&gt;
==== service demand ====&lt;br /&gt;
Load divided by CPU utilization&lt;br /&gt;
&lt;br /&gt;
== Networking setup ==&lt;br /&gt;
&lt;br /&gt;
== CPU setup ==&lt;br /&gt;
&lt;br /&gt;
== Guest setup ==&lt;br /&gt;
&lt;br /&gt;
== Hypervisor setup ==&lt;br /&gt;
&lt;br /&gt;
== Traffic load ==&lt;br /&gt;
&lt;br /&gt;
== Available tools ==&lt;br /&gt;
&lt;br /&gt;
== Hardware configuration ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;mst&amp;gt; yes&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; can we let the perf team to do that?&lt;br /&gt;
&amp;lt;mst&amp;gt; they likely won&#039;t do it in time&lt;br /&gt;
&amp;lt;mst&amp;gt; I started making up a list of what we need to measure&lt;br /&gt;
&amp;lt;mst&amp;gt; have a bit of time to discuss?&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; you mean we need to do it ourself?&lt;br /&gt;
&amp;lt;mst&amp;gt; at least part of it&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; I&#039;m sorry, I need to attend the autotest meeting in 10 minutes&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; mst ok&lt;br /&gt;
&amp;lt;mst&amp;gt; will have time afterward?&lt;br /&gt;
&amp;lt;mst&amp;gt; I know it&#039;s late in your TZ&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; cool, then I&#039;ll stay connected on irc just ping me&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; thanks!&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; you are welcome&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; hi, just back from the meeting&lt;br /&gt;
&amp;lt;mst&amp;gt; hi&lt;br /&gt;
&amp;lt;mst&amp;gt; okay so let&#039;s see what we have&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; okay&lt;br /&gt;
&amp;lt;mst&amp;gt; first we have the various connection options&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; we can do:&lt;br /&gt;
&amp;lt;mst&amp;gt; host to guest&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to host&lt;br /&gt;
&amp;lt;mst&amp;gt; ext to guest&lt;br /&gt;
&amp;lt;mst&amp;gt; ext to host&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to guest on local&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to guest across the net&lt;br /&gt;
&amp;lt;mst&amp;gt; for comparison it&#039;s probably useful to do &amp;quot;baremetal&amp;quot;: loopback and external&amp;lt;-&amp;gt;host&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; a bit more advanced: bidirectional tests&lt;br /&gt;
&amp;lt;mst&amp;gt; many to many is probably to hard to setup&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, so we need only test some key options&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, for now let&#039;s focus on things that are easy to define&lt;br /&gt;
&amp;lt;mst&amp;gt; ok now what kind of traffic we care about&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; (ext)host to guest, guest to (ext)host ?&lt;br /&gt;
&amp;lt;mst&amp;gt; no I mean scheduler is heavily involved&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; so guest to guest on local is also needed?&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, think so&lt;br /&gt;
&amp;lt;mst&amp;gt; so I think we need to try just defaults&lt;br /&gt;
&amp;lt;mst&amp;gt; (no pinning)&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, that is usual case&lt;br /&gt;
&amp;lt;mst&amp;gt; as well as pinned scenario where qemu is pinned to cpus&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; and for external pinning irqs as well&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; set irq affinity?&lt;br /&gt;
&amp;lt;mst&amp;gt; do you know whether virsh let you pin the iothread?&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, affinity&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; no, I don&#039;t use virsh&lt;br /&gt;
&amp;lt;mst&amp;gt; need to find out, only pin what virsh let us pin&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; okay&lt;br /&gt;
&amp;lt;mst&amp;gt; note vhost-net thread is created on demand, so it is not very practical to pin it&lt;br /&gt;
&amp;lt;mst&amp;gt; if we do need this capability it will have to be added, I am hoping scheduler does the right thing&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it&#039;s a workqueue in RHEL6.1&lt;br /&gt;
&amp;lt;mst&amp;gt; workqueue is just a list + thread, or we can change it if we like&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; do you man if we need we can use a dedicated thread like upstream which is easy to be pinned?&lt;br /&gt;
&amp;lt;mst&amp;gt; upstream is not easier to be pinned&lt;br /&gt;
&amp;lt;mst&amp;gt; the issue is mostly that thread is only created on driver OK now&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; so guest can destroy it and recreate and it loses what you set&lt;br /&gt;
&amp;lt;mst&amp;gt; in benchmark it works but not for real users&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, agree&lt;br /&gt;
&amp;lt;mst&amp;gt; maybe cgroups can be used somehow since it inherits the cgroups of the owner&lt;br /&gt;
&amp;lt;mst&amp;gt; another option is to let qemu control the pinning&lt;br /&gt;
&amp;lt;mst&amp;gt; either let it specify the thread to do the work&lt;br /&gt;
&amp;lt;mst&amp;gt; or just add ioctl for pinning&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; looks possible&lt;br /&gt;
&amp;lt;mst&amp;gt; in mark wagner&#039;s tests it seemed to work well without&lt;br /&gt;
&amp;lt;mst&amp;gt; so need to see if it&#039;s needed, it&#039;s not hard to add this interface&lt;br /&gt;
&amp;lt;mst&amp;gt; but once we add it must maintain forever&lt;br /&gt;
&amp;lt;mst&amp;gt; so I think irq affinity and cpu pinning are two options to try tweaking&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, have saw some performance discussion of vhost upstream&lt;br /&gt;
&amp;lt;mst&amp;gt; need to make sure we try on a numa box&lt;br /&gt;
&amp;lt;mst&amp;gt; at the moment kernel structures are allocated on first use&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; I hope it all fits in cache so should not matter&lt;br /&gt;
&amp;lt;mst&amp;gt; but need to check, not yet sure what exactly&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, things would be more complicated when using numa&lt;br /&gt;
&amp;lt;mst&amp;gt; not sure what exactly are the configurations to check&lt;br /&gt;
&amp;lt;mst&amp;gt; ok so we have the network setup and we have the cpu setup&lt;br /&gt;
&amp;lt;mst&amp;gt; let thing is traffic to check&lt;br /&gt;
&amp;lt;mst&amp;gt; let-&amp;gt;last&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, TCP_STREAM/UDP_STREAM/TCP_RR and something else?&lt;br /&gt;
&amp;lt;mst&amp;gt; let&#039;s focus on the protocols first&lt;br /&gt;
&amp;lt;mst&amp;gt; so we can do TCP, this has a strange property of coalescing messages&lt;br /&gt;
&amp;lt;mst&amp;gt; but OTOH it&#039;s the most used protocol&lt;br /&gt;
&amp;lt;mst&amp;gt; and it has hard requirements e.g. on the ordering of packets&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, TCP must to be tested&lt;br /&gt;
&amp;lt;mst&amp;gt; UDP is only working well up to mtu packet size&lt;br /&gt;
&amp;lt;mst&amp;gt; but otherwise it let us do pretty low level stuff&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, agree&lt;br /&gt;
&amp;lt;mst&amp;gt; ICMP is very low level (good), has a disadvantage that it might be special-cased in hardware and software (bad)&lt;br /&gt;
&amp;lt;mst&amp;gt; what kind of traffic we care about? ideally a range of message sizes, and a range of loads&lt;br /&gt;
&amp;lt;mst&amp;gt; (in terms of messages per second)&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; what do we want to measure?&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; bandwidth and latency&lt;br /&gt;
&amp;lt;mst&amp;gt; I think this not really it&lt;br /&gt;
&amp;lt;mst&amp;gt; this is what tools like to give us&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes and maybe also the  cpu usage&lt;br /&gt;
&amp;lt;mst&amp;gt; if you think about it in terms of an application, it is always latency that you care about in the end&lt;br /&gt;
&amp;lt;mst&amp;gt; e.g. I have this huge file what is the latency to send it over the network&lt;br /&gt;
&amp;lt;mst&amp;gt; and for us also what is the cpu load, you are right&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; so for a given traffic, which we can approximate by setting message size (both ways) protocol and messages per second&lt;br /&gt;
&amp;lt;mst&amp;gt; we want to know the latency and the cpu load&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; and we want the peak e.g. we want to know how high we can go in messages per second until latencies become unreasonable&lt;br /&gt;
&amp;lt;mst&amp;gt; this last is a bit subjective&lt;br /&gt;
&amp;lt;mst&amp;gt; but generally any system would gadually become less responsive with more load&lt;br /&gt;
&amp;lt;mst&amp;gt; then at some point it just breaks&lt;br /&gt;
&amp;lt;mst&amp;gt; cou load is a bit hard to define&lt;br /&gt;
&amp;lt;mst&amp;gt; cpu&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes and it looks hard to do the measuring then&lt;br /&gt;
&amp;lt;mst&amp;gt; I think in the end, what we care about is how many cpu cycles the host burns&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, but how to measure that?&lt;br /&gt;
&amp;lt;mst&amp;gt; well we have simple things like /proc/stat&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; understood and maybe perf can also help&lt;br /&gt;
&amp;lt;mst&amp;gt; yes quite possibly&lt;br /&gt;
&amp;lt;mst&amp;gt; in other words we&#039;ll need to measure this in parallel while test is running&lt;br /&gt;
&amp;lt;mst&amp;gt; netperf can report local/remote CPU&lt;br /&gt;
&amp;lt;mst&amp;gt; but I do not understand what it really means&lt;br /&gt;
&amp;lt;mst&amp;gt; especially for a guest&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, if we want to use netperf it&#039;s better to know how it does the calculation&lt;br /&gt;
&amp;lt;mst&amp;gt; well it just looks at /proc/stat AFAIK&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, I try to take a look at its source&lt;br /&gt;
&amp;lt;mst&amp;gt; this is the default but it has other heuristics&lt;br /&gt;
&amp;lt;mst&amp;gt; that can be configured at compile time&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok, understand&lt;br /&gt;
&amp;lt;mst&amp;gt; ok and I think load divided by CPU is a useful metric&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; so the ideal result is to get how many cpu cycles does vhost spend on send or receive a KB&lt;br /&gt;
&amp;lt;mst&amp;gt; netperf can report service demand&lt;br /&gt;
&amp;lt;mst&amp;gt; I do not understand what it is&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; From its manual its how many us the cpu spend on a KB&lt;br /&gt;
&amp;lt;mst&amp;gt; well the answer will be it depends :)&lt;br /&gt;
&amp;lt;mst&amp;gt; also, we have packet loss&lt;br /&gt;
&amp;lt;mst&amp;gt; I think at some level we only care about packets that were delivered&lt;br /&gt;
&amp;lt;mst&amp;gt; so e.g. with UDP we only care about received messages&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, the packet loss may have concerns with guest drivers&lt;br /&gt;
&amp;lt;mst&amp;gt; with TCP if you look at messages, there&#039;s no loss&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes TCP have flow control itself&lt;br /&gt;
&amp;lt;mst&amp;gt; ok so let&#039;s see what tools we have&lt;br /&gt;
&amp;lt;mst&amp;gt; the simplest is flood ping&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it&#039;s very simple and easy to use&lt;br /&gt;
&amp;lt;mst&amp;gt; it gives you control over message size, packets per second, gets you back latency&lt;br /&gt;
&amp;lt;mst&amp;gt; it is always bidirectional I think&lt;br /&gt;
&amp;lt;mst&amp;gt; and we need to measure CPU ourselves&lt;br /&gt;
&amp;lt;mst&amp;gt; that last seems to be true anyway&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, maybe easy to be understand and analysis than netperf&lt;br /&gt;
&amp;lt;mst&amp;gt; packet loss when it occurs complicates things&lt;br /&gt;
&amp;lt;mst&amp;gt; e.g. with 50% packet loss the real load is anywhere in between&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; that&#039;s the only problem: it&#039;s always bidirectional so tx/rx problems are hard to separate&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, vhost is currently half-duplex&lt;br /&gt;
&amp;lt;mst&amp;gt; I am also not sure it detect reordering&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it has sequence no.&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; but for ping, as you&#039;ve said it&#039;s ICMP and was not the most of the cases&lt;br /&gt;
&amp;lt;mst&amp;gt; ok, next we have netperf&lt;br /&gt;
&amp;lt;mst&amp;gt; afaik it can do two things&lt;br /&gt;
&amp;lt;mst&amp;gt; it can try sending as many packets as it can&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; or it can send a single one back and forth&lt;br /&gt;
&amp;lt;mst&amp;gt; not a lot of data, but ok&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; and similar with UDP&lt;br /&gt;
&amp;lt;mst&amp;gt; got to go have lunch&lt;br /&gt;
&amp;lt;mst&amp;gt; So I will try and write all this up&lt;br /&gt;
&amp;lt;mst&amp;gt; do you have any hardware for testing?&lt;br /&gt;
&amp;lt;mst&amp;gt; if yes we&#039;ll add it too, I&#039;ll put up a wiki&lt;br /&gt;
&amp;lt;mst&amp;gt; back in half an hour&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, write all things up would help&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; go home now, please send me mail&lt;br /&gt;
* jasonwang has quit (Quit: Leaving)&lt;br /&gt;
 &lt;br /&gt;
* Loaded log from Wed Dec 15 15:07:24 2010&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingPerformanceTesting&amp;diff=3453</id>
		<title>NetworkingPerformanceTesting</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingPerformanceTesting&amp;diff=3453"/>
		<updated>2010-12-15T19:26:47Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Networking Performance Testing ==&lt;br /&gt;
This is a summary of performance acceptance criteria for changes in hypervisor virt networking. The matrix of configurations we are interested in is built combining possible options. Naturally the bigger a change the more exhaustive would we want the coverage to be.&lt;br /&gt;
&lt;br /&gt;
We can get different configurations by selecting different options in the following categories: [[#Networking setup|Networking setup]], [[#CPU setup|CPU setup]], [[#Guest setup|Guest setup]], [[#Traffic load|Traffic load]].&lt;br /&gt;
For each of these we are interested in a set of [[#Performance metrics|Performance metrics]].&lt;br /&gt;
A test would need to be performed under a controlled Hardware configuration,&lt;br /&gt;
for each relevant [[#Hypervisor setup|Hypervisor setup]] and/or [[#Guest setup|Guest setup]] (depending on which change is tested) on the same hardware.&lt;br /&gt;
Ideally we&#039;d note the [[#Hardware configuration|Hardware configuration]] and person performing the test to increase the chance it can be reproduced later.&lt;br /&gt;
&lt;br /&gt;
== Performance metrics ==&lt;br /&gt;
Generally for a given setup and traffic&lt;br /&gt;
we want to know the [[#Latency|Latency]] and the [[#CPU load|CPU load]].&lt;br /&gt;
We generally might care about minimal, average (or median) and maximum&lt;br /&gt;
latencies.&lt;br /&gt;
&lt;br /&gt;
=== Latency ===&lt;br /&gt;
Latency is generally time until you get a response. For some workloads you don&#039;t measure latencies directly, instead you measure peak throughput.&lt;br /&gt;
&lt;br /&gt;
=== CPU load ===&lt;br /&gt;
The only metric that makes sense is probably host system load,&lt;br /&gt;
of which the only someone quantifiable component seems to be the CPU load.&lt;br /&gt;
Need take into account the fact that CPU speed might change&lt;br /&gt;
with time, so load should probably be in seconds&lt;br /&gt;
(%CPU/speed) rather than plain %CPU.&lt;br /&gt;
&lt;br /&gt;
Some derive metrics from this are:&lt;br /&gt;
==== peak throughput ====&lt;br /&gt;
How high we can load the system&lt;br /&gt;
until latencies sharply become unreasonable&lt;br /&gt;
==== service demand ====&lt;br /&gt;
Load divided by CPU utilization&lt;br /&gt;
&lt;br /&gt;
== Networking setup ==&lt;br /&gt;
&lt;br /&gt;
== CPU setup ==&lt;br /&gt;
&lt;br /&gt;
== Guest setup ==&lt;br /&gt;
&lt;br /&gt;
== Hypervisor setup ==&lt;br /&gt;
&lt;br /&gt;
== Traffic load ==&lt;br /&gt;
&lt;br /&gt;
=== Available tools ===&lt;br /&gt;
&lt;br /&gt;
== Hardware configuration ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;mst&amp;gt; yes&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; can we let the perf team to do that?&lt;br /&gt;
&amp;lt;mst&amp;gt; they likely won&#039;t do it in time&lt;br /&gt;
&amp;lt;mst&amp;gt; I started making up a list of what we need to measure&lt;br /&gt;
&amp;lt;mst&amp;gt; have a bit of time to discuss?&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; you mean we need to do it ourself?&lt;br /&gt;
&amp;lt;mst&amp;gt; at least part of it&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; I&#039;m sorry, I need to attend the autotest meeting in 10 minutes&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; mst ok&lt;br /&gt;
&amp;lt;mst&amp;gt; will have time afterward?&lt;br /&gt;
&amp;lt;mst&amp;gt; I know it&#039;s late in your TZ&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; cool, then I&#039;ll stay connected on irc just ping me&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; thanks!&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; you are welcome&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; hi, just back from the meeting&lt;br /&gt;
&amp;lt;mst&amp;gt; hi&lt;br /&gt;
&amp;lt;mst&amp;gt; okay so let&#039;s see what we have&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; okay&lt;br /&gt;
&amp;lt;mst&amp;gt; first we have the various connection options&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; we can do:&lt;br /&gt;
&amp;lt;mst&amp;gt; host to guest&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to host&lt;br /&gt;
&amp;lt;mst&amp;gt; ext to guest&lt;br /&gt;
&amp;lt;mst&amp;gt; ext to host&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to guest on local&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to guest across the net&lt;br /&gt;
&amp;lt;mst&amp;gt; for comparison it&#039;s probably useful to do &amp;quot;baremetal&amp;quot;: loopback and external&amp;lt;-&amp;gt;host&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; a bit more advanced: bidirectional tests&lt;br /&gt;
&amp;lt;mst&amp;gt; many to many is probably to hard to setup&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, so we need only test some key options&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, for now let&#039;s focus on things that are easy to define&lt;br /&gt;
&amp;lt;mst&amp;gt; ok now what kind of traffic we care about&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; (ext)host to guest, guest to (ext)host ?&lt;br /&gt;
&amp;lt;mst&amp;gt; no I mean scheduler is heavily involved&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; so guest to guest on local is also needed?&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, think so&lt;br /&gt;
&amp;lt;mst&amp;gt; so I think we need to try just defaults&lt;br /&gt;
&amp;lt;mst&amp;gt; (no pinning)&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, that is usual case&lt;br /&gt;
&amp;lt;mst&amp;gt; as well as pinned scenario where qemu is pinned to cpus&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; and for external pinning irqs as well&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; set irq affinity?&lt;br /&gt;
&amp;lt;mst&amp;gt; do you know whether virsh let you pin the iothread?&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, affinity&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; no, I don&#039;t use virsh&lt;br /&gt;
&amp;lt;mst&amp;gt; need to find out, only pin what virsh let us pin&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; okay&lt;br /&gt;
&amp;lt;mst&amp;gt; note vhost-net thread is created on demand, so it is not very practical to pin it&lt;br /&gt;
&amp;lt;mst&amp;gt; if we do need this capability it will have to be added, I am hoping scheduler does the right thing&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it&#039;s a workqueue in RHEL6.1&lt;br /&gt;
&amp;lt;mst&amp;gt; workqueue is just a list + thread, or we can change it if we like&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; do you man if we need we can use a dedicated thread like upstream which is easy to be pinned?&lt;br /&gt;
&amp;lt;mst&amp;gt; upstream is not easier to be pinned&lt;br /&gt;
&amp;lt;mst&amp;gt; the issue is mostly that thread is only created on driver OK now&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; so guest can destroy it and recreate and it loses what you set&lt;br /&gt;
&amp;lt;mst&amp;gt; in benchmark it works but not for real users&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, agree&lt;br /&gt;
&amp;lt;mst&amp;gt; maybe cgroups can be used somehow since it inherits the cgroups of the owner&lt;br /&gt;
&amp;lt;mst&amp;gt; another option is to let qemu control the pinning&lt;br /&gt;
&amp;lt;mst&amp;gt; either let it specify the thread to do the work&lt;br /&gt;
&amp;lt;mst&amp;gt; or just add ioctl for pinning&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; looks possible&lt;br /&gt;
&amp;lt;mst&amp;gt; in mark wagner&#039;s tests it seemed to work well without&lt;br /&gt;
&amp;lt;mst&amp;gt; so need to see if it&#039;s needed, it&#039;s not hard to add this interface&lt;br /&gt;
&amp;lt;mst&amp;gt; but once we add it must maintain forever&lt;br /&gt;
&amp;lt;mst&amp;gt; so I think irq affinity and cpu pinning are two options to try tweaking&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, have saw some performance discussion of vhost upstream&lt;br /&gt;
&amp;lt;mst&amp;gt; need to make sure we try on a numa box&lt;br /&gt;
&amp;lt;mst&amp;gt; at the moment kernel structures are allocated on first use&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; I hope it all fits in cache so should not matter&lt;br /&gt;
&amp;lt;mst&amp;gt; but need to check, not yet sure what exactly&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, things would be more complicated when using numa&lt;br /&gt;
&amp;lt;mst&amp;gt; not sure what exactly are the configurations to check&lt;br /&gt;
&amp;lt;mst&amp;gt; ok so we have the network setup and we have the cpu setup&lt;br /&gt;
&amp;lt;mst&amp;gt; let thing is traffic to check&lt;br /&gt;
&amp;lt;mst&amp;gt; let-&amp;gt;last&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, TCP_STREAM/UDP_STREAM/TCP_RR and something else?&lt;br /&gt;
&amp;lt;mst&amp;gt; let&#039;s focus on the protocols first&lt;br /&gt;
&amp;lt;mst&amp;gt; so we can do TCP, this has a strange property of coalescing messages&lt;br /&gt;
&amp;lt;mst&amp;gt; but OTOH it&#039;s the most used protocol&lt;br /&gt;
&amp;lt;mst&amp;gt; and it has hard requirements e.g. on the ordering of packets&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, TCP must to be tested&lt;br /&gt;
&amp;lt;mst&amp;gt; UDP is only working well up to mtu packet size&lt;br /&gt;
&amp;lt;mst&amp;gt; but otherwise it let us do pretty low level stuff&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, agree&lt;br /&gt;
&amp;lt;mst&amp;gt; ICMP is very low level (good), has a disadvantage that it might be special-cased in hardware and software (bad)&lt;br /&gt;
&amp;lt;mst&amp;gt; what kind of traffic we care about? ideally a range of message sizes, and a range of loads&lt;br /&gt;
&amp;lt;mst&amp;gt; (in terms of messages per second)&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; what do we want to measure?&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; bandwidth and latency&lt;br /&gt;
&amp;lt;mst&amp;gt; I think this not really it&lt;br /&gt;
&amp;lt;mst&amp;gt; this is what tools like to give us&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes and maybe also the  cpu usage&lt;br /&gt;
&amp;lt;mst&amp;gt; if you think about it in terms of an application, it is always latency that you care about in the end&lt;br /&gt;
&amp;lt;mst&amp;gt; e.g. I have this huge file what is the latency to send it over the network&lt;br /&gt;
&amp;lt;mst&amp;gt; and for us also what is the cpu load, you are right&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; so for a given traffic, which we can approximate by setting message size (both ways) protocol and messages per second&lt;br /&gt;
&amp;lt;mst&amp;gt; we want to know the latency and the cpu load&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; and we want the peak e.g. we want to know how high we can go in messages per second until latencies become unreasonable&lt;br /&gt;
&amp;lt;mst&amp;gt; this last is a bit subjective&lt;br /&gt;
&amp;lt;mst&amp;gt; but generally any system would gadually become less responsive with more load&lt;br /&gt;
&amp;lt;mst&amp;gt; then at some point it just breaks&lt;br /&gt;
&amp;lt;mst&amp;gt; cou load is a bit hard to define&lt;br /&gt;
&amp;lt;mst&amp;gt; cpu&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes and it looks hard to do the measuring then&lt;br /&gt;
&amp;lt;mst&amp;gt; I think in the end, what we care about is how many cpu cycles the host burns&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, but how to measure that?&lt;br /&gt;
&amp;lt;mst&amp;gt; well we have simple things like /proc/stat&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; understood and maybe perf can also help&lt;br /&gt;
&amp;lt;mst&amp;gt; yes quite possibly&lt;br /&gt;
&amp;lt;mst&amp;gt; in other words we&#039;ll need to measure this in parallel while test is running&lt;br /&gt;
&amp;lt;mst&amp;gt; netperf can report local/remote CPU&lt;br /&gt;
&amp;lt;mst&amp;gt; but I do not understand what it really means&lt;br /&gt;
&amp;lt;mst&amp;gt; especially for a guest&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, if we want to use netperf it&#039;s better to know how it does the calculation&lt;br /&gt;
&amp;lt;mst&amp;gt; well it just looks at /proc/stat AFAIK&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, I try to take a look at its source&lt;br /&gt;
&amp;lt;mst&amp;gt; this is the default but it has other heuristics&lt;br /&gt;
&amp;lt;mst&amp;gt; that can be configured at compile time&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok, understand&lt;br /&gt;
&amp;lt;mst&amp;gt; ok and I think load divided by CPU is a useful metric&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; so the ideal result is to get how many cpu cycles does vhost spend on send or receive a KB&lt;br /&gt;
&amp;lt;mst&amp;gt; netperf can report service demand&lt;br /&gt;
&amp;lt;mst&amp;gt; I do not understand what it is&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; From its manual its how many us the cpu spend on a KB&lt;br /&gt;
&amp;lt;mst&amp;gt; well the answer will be it depends :)&lt;br /&gt;
&amp;lt;mst&amp;gt; also, we have packet loss&lt;br /&gt;
&amp;lt;mst&amp;gt; I think at some level we only care about packets that were delivered&lt;br /&gt;
&amp;lt;mst&amp;gt; so e.g. with UDP we only care about received messages&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, the packet loss may have concerns with guest drivers&lt;br /&gt;
&amp;lt;mst&amp;gt; with TCP if you look at messages, there&#039;s no loss&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes TCP have flow control itself&lt;br /&gt;
&amp;lt;mst&amp;gt; ok so let&#039;s see what tools we have&lt;br /&gt;
&amp;lt;mst&amp;gt; the simplest is flood ping&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it&#039;s very simple and easy to use&lt;br /&gt;
&amp;lt;mst&amp;gt; it gives you control over message size, packets per second, gets you back latency&lt;br /&gt;
&amp;lt;mst&amp;gt; it is always bidirectional I think&lt;br /&gt;
&amp;lt;mst&amp;gt; and we need to measure CPU ourselves&lt;br /&gt;
&amp;lt;mst&amp;gt; that last seems to be true anyway&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, maybe easy to be understand and analysis than netperf&lt;br /&gt;
&amp;lt;mst&amp;gt; packet loss when it occurs complicates things&lt;br /&gt;
&amp;lt;mst&amp;gt; e.g. with 50% packet loss the real load is anywhere in between&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; that&#039;s the only problem: it&#039;s always bidirectional so tx/rx problems are hard to separate&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, vhost is currently half-duplex&lt;br /&gt;
&amp;lt;mst&amp;gt; I am also not sure it detect reordering&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it has sequence no.&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; but for ping, as you&#039;ve said it&#039;s ICMP and was not the most of the cases&lt;br /&gt;
&amp;lt;mst&amp;gt; ok, next we have netperf&lt;br /&gt;
&amp;lt;mst&amp;gt; afaik it can do two things&lt;br /&gt;
&amp;lt;mst&amp;gt; it can try sending as many packets as it can&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; or it can send a single one back and forth&lt;br /&gt;
&amp;lt;mst&amp;gt; not a lot of data, but ok&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; and similar with UDP&lt;br /&gt;
&amp;lt;mst&amp;gt; got to go have lunch&lt;br /&gt;
&amp;lt;mst&amp;gt; So I will try and write all this up&lt;br /&gt;
&amp;lt;mst&amp;gt; do you have any hardware for testing?&lt;br /&gt;
&amp;lt;mst&amp;gt; if yes we&#039;ll add it too, I&#039;ll put up a wiki&lt;br /&gt;
&amp;lt;mst&amp;gt; back in half an hour&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, write all things up would help&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; go home now, please send me mail&lt;br /&gt;
* jasonwang has quit (Quit: Leaving)&lt;br /&gt;
 &lt;br /&gt;
* Loaded log from Wed Dec 15 15:07:24 2010&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingPerformanceTesting&amp;diff=3452</id>
		<title>NetworkingPerformanceTesting</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingPerformanceTesting&amp;diff=3452"/>
		<updated>2010-12-15T19:24:19Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Networking Performance Testing ==&lt;br /&gt;
This is a summary of performance acceptance criteria for changes in hypervisor virt networking. The matrix of configurations we are interested in is built combining possible options. Naturally the bigger a change the more exhaustive would we want the coverage to be.&lt;br /&gt;
&lt;br /&gt;
We can get different configurations by selecting different options in the following categories: [[#Networking setup|Networking setup]], [[#CPU setup|CPU setup]], [[#Guest setup|Guest setup]], [[#Traffic load|Traffic load]].&lt;br /&gt;
For each of these we are interested in a set of [[#Performance metrics|Performance metrics]].&lt;br /&gt;
A test would need to be performed under a controlled Hardware configuration,&lt;br /&gt;
for each relevant [[#Hypervisor setup|Hypervisor setup]] and/or [[#Guest setup|Guest setup]] (depending on which change is tested) on the same hardware.&lt;br /&gt;
Ideally we&#039;d note the [[#Hardware configuration|Hardware configuration]] and person performing the test to increase the chance it can be reproduced later.&lt;br /&gt;
&lt;br /&gt;
== Performance metrics ==&lt;br /&gt;
Generally for a given setup and traffic&lt;br /&gt;
we want to know the [[#Latency|Latency]] and the [[#CPU load|CPU load]].&lt;br /&gt;
We generally might care about minimal, average (or median) and maximum&lt;br /&gt;
latencies.&lt;br /&gt;
&lt;br /&gt;
=== Latency ===&lt;br /&gt;
Latency is generally time until you get a response. For some workloads you don&#039;t measure latencies directly, instead you measure peak throughput.&lt;br /&gt;
&lt;br /&gt;
=== CPU load ===&lt;br /&gt;
The only metric that makes sense is probably host system load,&lt;br /&gt;
of which the only someone quantifiable component seems to be the CPU load.&lt;br /&gt;
Need take into account the fact that CPU speed might change&lt;br /&gt;
with time, so load should probably be in seconds&lt;br /&gt;
(%CPU/speed) rather than plain %CPU.&lt;br /&gt;
&lt;br /&gt;
Some derive metrics from this are:&lt;br /&gt;
==== peak throughput ====&lt;br /&gt;
How high we can load the system&lt;br /&gt;
until latencies sharply become unreasonable&lt;br /&gt;
==== service demand ====&lt;br /&gt;
Load divided by CPU utilization&lt;br /&gt;
&lt;br /&gt;
== Networking setup ==&lt;br /&gt;
&lt;br /&gt;
== CPU setup ==&lt;br /&gt;
&lt;br /&gt;
== Guest setup ==&lt;br /&gt;
&lt;br /&gt;
== Hypervisor setup ==&lt;br /&gt;
&lt;br /&gt;
== Traffic load ==&lt;br /&gt;
&lt;br /&gt;
== Hardware configuration ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;mst&amp;gt; yes&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; can we let the perf team to do that?&lt;br /&gt;
&amp;lt;mst&amp;gt; they likely won&#039;t do it in time&lt;br /&gt;
&amp;lt;mst&amp;gt; I started making up a list of what we need to measure&lt;br /&gt;
&amp;lt;mst&amp;gt; have a bit of time to discuss?&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; you mean we need to do it ourself?&lt;br /&gt;
&amp;lt;mst&amp;gt; at least part of it&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; I&#039;m sorry, I need to attend the autotest meeting in 10 minutes&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; mst ok&lt;br /&gt;
&amp;lt;mst&amp;gt; will have time afterward?&lt;br /&gt;
&amp;lt;mst&amp;gt; I know it&#039;s late in your TZ&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; cool, then I&#039;ll stay connected on irc just ping me&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; thanks!&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; you are welcome&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; hi, just back from the meeting&lt;br /&gt;
&amp;lt;mst&amp;gt; hi&lt;br /&gt;
&amp;lt;mst&amp;gt; okay so let&#039;s see what we have&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; okay&lt;br /&gt;
&amp;lt;mst&amp;gt; first we have the various connection options&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; we can do:&lt;br /&gt;
&amp;lt;mst&amp;gt; host to guest&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to host&lt;br /&gt;
&amp;lt;mst&amp;gt; ext to guest&lt;br /&gt;
&amp;lt;mst&amp;gt; ext to host&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to guest on local&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to guest across the net&lt;br /&gt;
&amp;lt;mst&amp;gt; for comparison it&#039;s probably useful to do &amp;quot;baremetal&amp;quot;: loopback and external&amp;lt;-&amp;gt;host&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; a bit more advanced: bidirectional tests&lt;br /&gt;
&amp;lt;mst&amp;gt; many to many is probably to hard to setup&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, so we need only test some key options&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, for now let&#039;s focus on things that are easy to define&lt;br /&gt;
&amp;lt;mst&amp;gt; ok now what kind of traffic we care about&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; (ext)host to guest, guest to (ext)host ?&lt;br /&gt;
&amp;lt;mst&amp;gt; no I mean scheduler is heavily involved&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; so guest to guest on local is also needed?&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, think so&lt;br /&gt;
&amp;lt;mst&amp;gt; so I think we need to try just defaults&lt;br /&gt;
&amp;lt;mst&amp;gt; (no pinning)&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, that is usual case&lt;br /&gt;
&amp;lt;mst&amp;gt; as well as pinned scenario where qemu is pinned to cpus&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; and for external pinning irqs as well&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; set irq affinity?&lt;br /&gt;
&amp;lt;mst&amp;gt; do you know whether virsh let you pin the iothread?&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, affinity&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; no, I don&#039;t use virsh&lt;br /&gt;
&amp;lt;mst&amp;gt; need to find out, only pin what virsh let us pin&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; okay&lt;br /&gt;
&amp;lt;mst&amp;gt; note vhost-net thread is created on demand, so it is not very practical to pin it&lt;br /&gt;
&amp;lt;mst&amp;gt; if we do need this capability it will have to be added, I am hoping scheduler does the right thing&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it&#039;s a workqueue in RHEL6.1&lt;br /&gt;
&amp;lt;mst&amp;gt; workqueue is just a list + thread, or we can change it if we like&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; do you man if we need we can use a dedicated thread like upstream which is easy to be pinned?&lt;br /&gt;
&amp;lt;mst&amp;gt; upstream is not easier to be pinned&lt;br /&gt;
&amp;lt;mst&amp;gt; the issue is mostly that thread is only created on driver OK now&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; so guest can destroy it and recreate and it loses what you set&lt;br /&gt;
&amp;lt;mst&amp;gt; in benchmark it works but not for real users&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, agree&lt;br /&gt;
&amp;lt;mst&amp;gt; maybe cgroups can be used somehow since it inherits the cgroups of the owner&lt;br /&gt;
&amp;lt;mst&amp;gt; another option is to let qemu control the pinning&lt;br /&gt;
&amp;lt;mst&amp;gt; either let it specify the thread to do the work&lt;br /&gt;
&amp;lt;mst&amp;gt; or just add ioctl for pinning&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; looks possible&lt;br /&gt;
&amp;lt;mst&amp;gt; in mark wagner&#039;s tests it seemed to work well without&lt;br /&gt;
&amp;lt;mst&amp;gt; so need to see if it&#039;s needed, it&#039;s not hard to add this interface&lt;br /&gt;
&amp;lt;mst&amp;gt; but once we add it must maintain forever&lt;br /&gt;
&amp;lt;mst&amp;gt; so I think irq affinity and cpu pinning are two options to try tweaking&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, have saw some performance discussion of vhost upstream&lt;br /&gt;
&amp;lt;mst&amp;gt; need to make sure we try on a numa box&lt;br /&gt;
&amp;lt;mst&amp;gt; at the moment kernel structures are allocated on first use&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; I hope it all fits in cache so should not matter&lt;br /&gt;
&amp;lt;mst&amp;gt; but need to check, not yet sure what exactly&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, things would be more complicated when using numa&lt;br /&gt;
&amp;lt;mst&amp;gt; not sure what exactly are the configurations to check&lt;br /&gt;
&amp;lt;mst&amp;gt; ok so we have the network setup and we have the cpu setup&lt;br /&gt;
&amp;lt;mst&amp;gt; let thing is traffic to check&lt;br /&gt;
&amp;lt;mst&amp;gt; let-&amp;gt;last&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, TCP_STREAM/UDP_STREAM/TCP_RR and something else?&lt;br /&gt;
&amp;lt;mst&amp;gt; let&#039;s focus on the protocols first&lt;br /&gt;
&amp;lt;mst&amp;gt; so we can do TCP, this has a strange property of coalescing messages&lt;br /&gt;
&amp;lt;mst&amp;gt; but OTOH it&#039;s the most used protocol&lt;br /&gt;
&amp;lt;mst&amp;gt; and it has hard requirements e.g. on the ordering of packets&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, TCP must to be tested&lt;br /&gt;
&amp;lt;mst&amp;gt; UDP is only working well up to mtu packet size&lt;br /&gt;
&amp;lt;mst&amp;gt; but otherwise it let us do pretty low level stuff&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, agree&lt;br /&gt;
&amp;lt;mst&amp;gt; ICMP is very low level (good), has a disadvantage that it might be special-cased in hardware and software (bad)&lt;br /&gt;
&amp;lt;mst&amp;gt; what kind of traffic we care about? ideally a range of message sizes, and a range of loads&lt;br /&gt;
&amp;lt;mst&amp;gt; (in terms of messages per second)&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; what do we want to measure?&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; bandwidth and latency&lt;br /&gt;
&amp;lt;mst&amp;gt; I think this not really it&lt;br /&gt;
&amp;lt;mst&amp;gt; this is what tools like to give us&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes and maybe also the  cpu usage&lt;br /&gt;
&amp;lt;mst&amp;gt; if you think about it in terms of an application, it is always latency that you care about in the end&lt;br /&gt;
&amp;lt;mst&amp;gt; e.g. I have this huge file what is the latency to send it over the network&lt;br /&gt;
&amp;lt;mst&amp;gt; and for us also what is the cpu load, you are right&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; so for a given traffic, which we can approximate by setting message size (both ways) protocol and messages per second&lt;br /&gt;
&amp;lt;mst&amp;gt; we want to know the latency and the cpu load&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; and we want the peak e.g. we want to know how high we can go in messages per second until latencies become unreasonable&lt;br /&gt;
&amp;lt;mst&amp;gt; this last is a bit subjective&lt;br /&gt;
&amp;lt;mst&amp;gt; but generally any system would gadually become less responsive with more load&lt;br /&gt;
&amp;lt;mst&amp;gt; then at some point it just breaks&lt;br /&gt;
&amp;lt;mst&amp;gt; cou load is a bit hard to define&lt;br /&gt;
&amp;lt;mst&amp;gt; cpu&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes and it looks hard to do the measuring then&lt;br /&gt;
&amp;lt;mst&amp;gt; I think in the end, what we care about is how many cpu cycles the host burns&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, but how to measure that?&lt;br /&gt;
&amp;lt;mst&amp;gt; well we have simple things like /proc/stat&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; understood and maybe perf can also help&lt;br /&gt;
&amp;lt;mst&amp;gt; yes quite possibly&lt;br /&gt;
&amp;lt;mst&amp;gt; in other words we&#039;ll need to measure this in parallel while test is running&lt;br /&gt;
&amp;lt;mst&amp;gt; netperf can report local/remote CPU&lt;br /&gt;
&amp;lt;mst&amp;gt; but I do not understand what it really means&lt;br /&gt;
&amp;lt;mst&amp;gt; especially for a guest&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, if we want to use netperf it&#039;s better to know how it does the calculation&lt;br /&gt;
&amp;lt;mst&amp;gt; well it just looks at /proc/stat AFAIK&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, I try to take a look at its source&lt;br /&gt;
&amp;lt;mst&amp;gt; this is the default but it has other heuristics&lt;br /&gt;
&amp;lt;mst&amp;gt; that can be configured at compile time&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok, understand&lt;br /&gt;
&amp;lt;mst&amp;gt; ok and I think load divided by CPU is a useful metric&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; so the ideal result is to get how many cpu cycles does vhost spend on send or receive a KB&lt;br /&gt;
&amp;lt;mst&amp;gt; netperf can report service demand&lt;br /&gt;
&amp;lt;mst&amp;gt; I do not understand what it is&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; From its manual its how many us the cpu spend on a KB&lt;br /&gt;
&amp;lt;mst&amp;gt; well the answer will be it depends :)&lt;br /&gt;
&amp;lt;mst&amp;gt; also, we have packet loss&lt;br /&gt;
&amp;lt;mst&amp;gt; I think at some level we only care about packets that were delivered&lt;br /&gt;
&amp;lt;mst&amp;gt; so e.g. with UDP we only care about received messages&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, the packet loss may have concerns with guest drivers&lt;br /&gt;
&amp;lt;mst&amp;gt; with TCP if you look at messages, there&#039;s no loss&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes TCP have flow control itself&lt;br /&gt;
&amp;lt;mst&amp;gt; ok so let&#039;s see what tools we have&lt;br /&gt;
&amp;lt;mst&amp;gt; the simplest is flood ping&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it&#039;s very simple and easy to use&lt;br /&gt;
&amp;lt;mst&amp;gt; it gives you control over message size, packets per second, gets you back latency&lt;br /&gt;
&amp;lt;mst&amp;gt; it is always bidirectional I think&lt;br /&gt;
&amp;lt;mst&amp;gt; and we need to measure CPU ourselves&lt;br /&gt;
&amp;lt;mst&amp;gt; that last seems to be true anyway&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, maybe easy to be understand and analysis than netperf&lt;br /&gt;
&amp;lt;mst&amp;gt; packet loss when it occurs complicates things&lt;br /&gt;
&amp;lt;mst&amp;gt; e.g. with 50% packet loss the real load is anywhere in between&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; that&#039;s the only problem: it&#039;s always bidirectional so tx/rx problems are hard to separate&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, vhost is currently half-duplex&lt;br /&gt;
&amp;lt;mst&amp;gt; I am also not sure it detect reordering&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it has sequence no.&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; but for ping, as you&#039;ve said it&#039;s ICMP and was not the most of the cases&lt;br /&gt;
&amp;lt;mst&amp;gt; ok, next we have netperf&lt;br /&gt;
&amp;lt;mst&amp;gt; afaik it can do two things&lt;br /&gt;
&amp;lt;mst&amp;gt; it can try sending as many packets as it can&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; or it can send a single one back and forth&lt;br /&gt;
&amp;lt;mst&amp;gt; not a lot of data, but ok&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; and similar with UDP&lt;br /&gt;
&amp;lt;mst&amp;gt; got to go have lunch&lt;br /&gt;
&amp;lt;mst&amp;gt; So I will try and write all this up&lt;br /&gt;
&amp;lt;mst&amp;gt; do you have any hardware for testing?&lt;br /&gt;
&amp;lt;mst&amp;gt; if yes we&#039;ll add it too, I&#039;ll put up a wiki&lt;br /&gt;
&amp;lt;mst&amp;gt; back in half an hour&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, write all things up would help&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; go home now, please send me mail&lt;br /&gt;
* jasonwang has quit (Quit: Leaving)&lt;br /&gt;
 &lt;br /&gt;
* Loaded log from Wed Dec 15 15:07:24 2010&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingPerformanceTesting&amp;diff=3451</id>
		<title>NetworkingPerformanceTesting</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingPerformanceTesting&amp;diff=3451"/>
		<updated>2010-12-15T19:23:29Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Networking Performance Testing ==&lt;br /&gt;
This is a summary of performance acceptance criteria for changes in hypervisor virt networking. The matrix of configurations we are interested in is built combining possible options. Naturally the bigger a change the more exhaustive would we want the coverage to be.&lt;br /&gt;
&lt;br /&gt;
We can get different configurations by selecting different options in the following categories: [[#Networking setup|Networking setup]], [[#CPU setup|CPU setup]], [[#Guest setup|Guest setup]], [[#Traffic load|Traffic load]].&lt;br /&gt;
For each of these we are interested in a set of [[#Performance metrics|Performance metrics]].&lt;br /&gt;
A test would need to be performed under a controlled Hardware configuration,&lt;br /&gt;
for each relevant [[#Hypervisor setup|Hypervisor setup]] and/or [[#Guest setup|Guest setup]] (depending on which change is tested) on the same hardware.&lt;br /&gt;
Ideally we&#039;d note the [[#Hardware configuration|Hardware configuration]] and person performing the test to increase the chance it can be reproduced later.&lt;br /&gt;
&lt;br /&gt;
== Performance metrics ==&lt;br /&gt;
Generally for a given setup and traffic&lt;br /&gt;
we want to know the [[#Latency|Latency]] and the [[#CPU load|CPU load]].&lt;br /&gt;
We generally might care about minimal, average (or median) and maximum&lt;br /&gt;
latencies.&lt;br /&gt;
&lt;br /&gt;
=== Latency ===&lt;br /&gt;
Latency is generally time until you get a response. For some workloads you don&#039;t measure latencies directly, instead you measure peak throughput.&lt;br /&gt;
&lt;br /&gt;
=== CPU load ===&lt;br /&gt;
The only metric that makes sense is probably host system load,&lt;br /&gt;
of which the only someone quantifiable component seems to be the CPU load.&lt;br /&gt;
Need take into account the fact that CPU speed might change&lt;br /&gt;
with time, so load should probably be in seconds&lt;br /&gt;
(%CPU/speed) rather than plain %CPU.&lt;br /&gt;
&lt;br /&gt;
Some derive metrics from this are:&lt;br /&gt;
==== peak throughput ====&lt;br /&gt;
  how high we can load the system&lt;br /&gt;
  until latencies sharply become unreasonable&lt;br /&gt;
==== service demand ====&lt;br /&gt;
 load divided by CPU&lt;br /&gt;
&lt;br /&gt;
== Networking setup ==&lt;br /&gt;
&lt;br /&gt;
== CPU setup ==&lt;br /&gt;
&lt;br /&gt;
== Guest setup ==&lt;br /&gt;
&lt;br /&gt;
== Hypervisor setup ==&lt;br /&gt;
&lt;br /&gt;
== Traffic load ==&lt;br /&gt;
&lt;br /&gt;
== Hardware configuration ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;mst&amp;gt; yes&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; can we let the perf team to do that?&lt;br /&gt;
&amp;lt;mst&amp;gt; they likely won&#039;t do it in time&lt;br /&gt;
&amp;lt;mst&amp;gt; I started making up a list of what we need to measure&lt;br /&gt;
&amp;lt;mst&amp;gt; have a bit of time to discuss?&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; you mean we need to do it ourself?&lt;br /&gt;
&amp;lt;mst&amp;gt; at least part of it&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; I&#039;m sorry, I need to attend the autotest meeting in 10 minutes&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; mst ok&lt;br /&gt;
&amp;lt;mst&amp;gt; will have time afterward?&lt;br /&gt;
&amp;lt;mst&amp;gt; I know it&#039;s late in your TZ&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; cool, then I&#039;ll stay connected on irc just ping me&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; thanks!&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; you are welcome&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; hi, just back from the meeting&lt;br /&gt;
&amp;lt;mst&amp;gt; hi&lt;br /&gt;
&amp;lt;mst&amp;gt; okay so let&#039;s see what we have&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; okay&lt;br /&gt;
&amp;lt;mst&amp;gt; first we have the various connection options&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; we can do:&lt;br /&gt;
&amp;lt;mst&amp;gt; host to guest&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to host&lt;br /&gt;
&amp;lt;mst&amp;gt; ext to guest&lt;br /&gt;
&amp;lt;mst&amp;gt; ext to host&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to guest on local&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to guest across the net&lt;br /&gt;
&amp;lt;mst&amp;gt; for comparison it&#039;s probably useful to do &amp;quot;baremetal&amp;quot;: loopback and external&amp;lt;-&amp;gt;host&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; a bit more advanced: bidirectional tests&lt;br /&gt;
&amp;lt;mst&amp;gt; many to many is probably to hard to setup&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, so we need only test some key options&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, for now let&#039;s focus on things that are easy to define&lt;br /&gt;
&amp;lt;mst&amp;gt; ok now what kind of traffic we care about&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; (ext)host to guest, guest to (ext)host ?&lt;br /&gt;
&amp;lt;mst&amp;gt; no I mean scheduler is heavily involved&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; so guest to guest on local is also needed?&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, think so&lt;br /&gt;
&amp;lt;mst&amp;gt; so I think we need to try just defaults&lt;br /&gt;
&amp;lt;mst&amp;gt; (no pinning)&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, that is usual case&lt;br /&gt;
&amp;lt;mst&amp;gt; as well as pinned scenario where qemu is pinned to cpus&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; and for external pinning irqs as well&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; set irq affinity?&lt;br /&gt;
&amp;lt;mst&amp;gt; do you know whether virsh let you pin the iothread?&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, affinity&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; no, I don&#039;t use virsh&lt;br /&gt;
&amp;lt;mst&amp;gt; need to find out, only pin what virsh let us pin&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; okay&lt;br /&gt;
&amp;lt;mst&amp;gt; note vhost-net thread is created on demand, so it is not very practical to pin it&lt;br /&gt;
&amp;lt;mst&amp;gt; if we do need this capability it will have to be added, I am hoping scheduler does the right thing&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it&#039;s a workqueue in RHEL6.1&lt;br /&gt;
&amp;lt;mst&amp;gt; workqueue is just a list + thread, or we can change it if we like&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; do you man if we need we can use a dedicated thread like upstream which is easy to be pinned?&lt;br /&gt;
&amp;lt;mst&amp;gt; upstream is not easier to be pinned&lt;br /&gt;
&amp;lt;mst&amp;gt; the issue is mostly that thread is only created on driver OK now&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; so guest can destroy it and recreate and it loses what you set&lt;br /&gt;
&amp;lt;mst&amp;gt; in benchmark it works but not for real users&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, agree&lt;br /&gt;
&amp;lt;mst&amp;gt; maybe cgroups can be used somehow since it inherits the cgroups of the owner&lt;br /&gt;
&amp;lt;mst&amp;gt; another option is to let qemu control the pinning&lt;br /&gt;
&amp;lt;mst&amp;gt; either let it specify the thread to do the work&lt;br /&gt;
&amp;lt;mst&amp;gt; or just add ioctl for pinning&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; looks possible&lt;br /&gt;
&amp;lt;mst&amp;gt; in mark wagner&#039;s tests it seemed to work well without&lt;br /&gt;
&amp;lt;mst&amp;gt; so need to see if it&#039;s needed, it&#039;s not hard to add this interface&lt;br /&gt;
&amp;lt;mst&amp;gt; but once we add it must maintain forever&lt;br /&gt;
&amp;lt;mst&amp;gt; so I think irq affinity and cpu pinning are two options to try tweaking&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, have saw some performance discussion of vhost upstream&lt;br /&gt;
&amp;lt;mst&amp;gt; need to make sure we try on a numa box&lt;br /&gt;
&amp;lt;mst&amp;gt; at the moment kernel structures are allocated on first use&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; I hope it all fits in cache so should not matter&lt;br /&gt;
&amp;lt;mst&amp;gt; but need to check, not yet sure what exactly&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, things would be more complicated when using numa&lt;br /&gt;
&amp;lt;mst&amp;gt; not sure what exactly are the configurations to check&lt;br /&gt;
&amp;lt;mst&amp;gt; ok so we have the network setup and we have the cpu setup&lt;br /&gt;
&amp;lt;mst&amp;gt; let thing is traffic to check&lt;br /&gt;
&amp;lt;mst&amp;gt; let-&amp;gt;last&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, TCP_STREAM/UDP_STREAM/TCP_RR and something else?&lt;br /&gt;
&amp;lt;mst&amp;gt; let&#039;s focus on the protocols first&lt;br /&gt;
&amp;lt;mst&amp;gt; so we can do TCP, this has a strange property of coalescing messages&lt;br /&gt;
&amp;lt;mst&amp;gt; but OTOH it&#039;s the most used protocol&lt;br /&gt;
&amp;lt;mst&amp;gt; and it has hard requirements e.g. on the ordering of packets&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, TCP must to be tested&lt;br /&gt;
&amp;lt;mst&amp;gt; UDP is only working well up to mtu packet size&lt;br /&gt;
&amp;lt;mst&amp;gt; but otherwise it let us do pretty low level stuff&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, agree&lt;br /&gt;
&amp;lt;mst&amp;gt; ICMP is very low level (good), has a disadvantage that it might be special-cased in hardware and software (bad)&lt;br /&gt;
&amp;lt;mst&amp;gt; what kind of traffic we care about? ideally a range of message sizes, and a range of loads&lt;br /&gt;
&amp;lt;mst&amp;gt; (in terms of messages per second)&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; what do we want to measure?&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; bandwidth and latency&lt;br /&gt;
&amp;lt;mst&amp;gt; I think this not really it&lt;br /&gt;
&amp;lt;mst&amp;gt; this is what tools like to give us&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes and maybe also the  cpu usage&lt;br /&gt;
&amp;lt;mst&amp;gt; if you think about it in terms of an application, it is always latency that you care about in the end&lt;br /&gt;
&amp;lt;mst&amp;gt; e.g. I have this huge file what is the latency to send it over the network&lt;br /&gt;
&amp;lt;mst&amp;gt; and for us also what is the cpu load, you are right&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; so for a given traffic, which we can approximate by setting message size (both ways) protocol and messages per second&lt;br /&gt;
&amp;lt;mst&amp;gt; we want to know the latency and the cpu load&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; and we want the peak e.g. we want to know how high we can go in messages per second until latencies become unreasonable&lt;br /&gt;
&amp;lt;mst&amp;gt; this last is a bit subjective&lt;br /&gt;
&amp;lt;mst&amp;gt; but generally any system would gadually become less responsive with more load&lt;br /&gt;
&amp;lt;mst&amp;gt; then at some point it just breaks&lt;br /&gt;
&amp;lt;mst&amp;gt; cou load is a bit hard to define&lt;br /&gt;
&amp;lt;mst&amp;gt; cpu&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes and it looks hard to do the measuring then&lt;br /&gt;
&amp;lt;mst&amp;gt; I think in the end, what we care about is how many cpu cycles the host burns&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, but how to measure that?&lt;br /&gt;
&amp;lt;mst&amp;gt; well we have simple things like /proc/stat&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; understood and maybe perf can also help&lt;br /&gt;
&amp;lt;mst&amp;gt; yes quite possibly&lt;br /&gt;
&amp;lt;mst&amp;gt; in other words we&#039;ll need to measure this in parallel while test is running&lt;br /&gt;
&amp;lt;mst&amp;gt; netperf can report local/remote CPU&lt;br /&gt;
&amp;lt;mst&amp;gt; but I do not understand what it really means&lt;br /&gt;
&amp;lt;mst&amp;gt; especially for a guest&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, if we want to use netperf it&#039;s better to know how it does the calculation&lt;br /&gt;
&amp;lt;mst&amp;gt; well it just looks at /proc/stat AFAIK&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, I try to take a look at its source&lt;br /&gt;
&amp;lt;mst&amp;gt; this is the default but it has other heuristics&lt;br /&gt;
&amp;lt;mst&amp;gt; that can be configured at compile time&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok, understand&lt;br /&gt;
&amp;lt;mst&amp;gt; ok and I think load divided by CPU is a useful metric&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; so the ideal result is to get how many cpu cycles does vhost spend on send or receive a KB&lt;br /&gt;
&amp;lt;mst&amp;gt; netperf can report service demand&lt;br /&gt;
&amp;lt;mst&amp;gt; I do not understand what it is&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; From its manual its how many us the cpu spend on a KB&lt;br /&gt;
&amp;lt;mst&amp;gt; well the answer will be it depends :)&lt;br /&gt;
&amp;lt;mst&amp;gt; also, we have packet loss&lt;br /&gt;
&amp;lt;mst&amp;gt; I think at some level we only care about packets that were delivered&lt;br /&gt;
&amp;lt;mst&amp;gt; so e.g. with UDP we only care about received messages&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, the packet loss may have concerns with guest drivers&lt;br /&gt;
&amp;lt;mst&amp;gt; with TCP if you look at messages, there&#039;s no loss&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes TCP have flow control itself&lt;br /&gt;
&amp;lt;mst&amp;gt; ok so let&#039;s see what tools we have&lt;br /&gt;
&amp;lt;mst&amp;gt; the simplest is flood ping&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it&#039;s very simple and easy to use&lt;br /&gt;
&amp;lt;mst&amp;gt; it gives you control over message size, packets per second, gets you back latency&lt;br /&gt;
&amp;lt;mst&amp;gt; it is always bidirectional I think&lt;br /&gt;
&amp;lt;mst&amp;gt; and we need to measure CPU ourselves&lt;br /&gt;
&amp;lt;mst&amp;gt; that last seems to be true anyway&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, maybe easy to be understand and analysis than netperf&lt;br /&gt;
&amp;lt;mst&amp;gt; packet loss when it occurs complicates things&lt;br /&gt;
&amp;lt;mst&amp;gt; e.g. with 50% packet loss the real load is anywhere in between&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; that&#039;s the only problem: it&#039;s always bidirectional so tx/rx problems are hard to separate&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, vhost is currently half-duplex&lt;br /&gt;
&amp;lt;mst&amp;gt; I am also not sure it detect reordering&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it has sequence no.&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; but for ping, as you&#039;ve said it&#039;s ICMP and was not the most of the cases&lt;br /&gt;
&amp;lt;mst&amp;gt; ok, next we have netperf&lt;br /&gt;
&amp;lt;mst&amp;gt; afaik it can do two things&lt;br /&gt;
&amp;lt;mst&amp;gt; it can try sending as many packets as it can&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; or it can send a single one back and forth&lt;br /&gt;
&amp;lt;mst&amp;gt; not a lot of data, but ok&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; and similar with UDP&lt;br /&gt;
&amp;lt;mst&amp;gt; got to go have lunch&lt;br /&gt;
&amp;lt;mst&amp;gt; So I will try and write all this up&lt;br /&gt;
&amp;lt;mst&amp;gt; do you have any hardware for testing?&lt;br /&gt;
&amp;lt;mst&amp;gt; if yes we&#039;ll add it too, I&#039;ll put up a wiki&lt;br /&gt;
&amp;lt;mst&amp;gt; back in half an hour&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, write all things up would help&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; go home now, please send me mail&lt;br /&gt;
* jasonwang has quit (Quit: Leaving)&lt;br /&gt;
 &lt;br /&gt;
* Loaded log from Wed Dec 15 15:07:24 2010&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingPerformanceTesting&amp;diff=3450</id>
		<title>NetworkingPerformanceTesting</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingPerformanceTesting&amp;diff=3450"/>
		<updated>2010-12-15T19:22:26Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Networking Performance Testing ==&lt;br /&gt;
This is a summary of performance acceptance criteria for changes in hypervisor virt networking. The matrix of configurations we are interested in is built combining possible options. Naturally the bigger a change the more exhaustive would we want the coverage to be.&lt;br /&gt;
&lt;br /&gt;
We can get different configurations by selecting different options in the following categories: [[#Networking setup|Networking setup]], [[#CPU setup|CPU setup]], [[#Guest setup|Guest setup]], [[#Traffic load|Traffic load]].&lt;br /&gt;
For each of these we are interested in a set of [[#Performance metrics|Performance metrics]].&lt;br /&gt;
A test would need to be performed under a controlled Hardware configuration,&lt;br /&gt;
for each relevant [[#Hypervisor setup|Hypervisor setup]] and/or [[#Guest setup|Guest setup]] (depending on which change is tested) on the same hardware.&lt;br /&gt;
Ideally we&#039;d note the [[#Hardware configuration|Hardware configuration]] and person performing the test to increase the chance it can be reproduced later.&lt;br /&gt;
&lt;br /&gt;
== Performance metrics ==&lt;br /&gt;
Generally for a given setup and traffic&lt;br /&gt;
we want to know the [[#Latency|Latency]] and the [[#CPU load|CPU load]].&lt;br /&gt;
We generally might care about minimal, average (or median) and maximum&lt;br /&gt;
latencies.&lt;br /&gt;
&lt;br /&gt;
Some derive metrics from this are:&lt;br /&gt;
==== *peak throughput* i.e. how high we can go&lt;br /&gt;
  until latencies sharply become unreasonable&lt;br /&gt;
==== *service demand*: load divided by CPU&lt;br /&gt;
&lt;br /&gt;
=== Latency ===&lt;br /&gt;
Latency is generally time until you get a response. For some workloads you don&#039;t measure latencies directly, instead you measure peak throughput.&lt;br /&gt;
&lt;br /&gt;
=== CPU load ===&lt;br /&gt;
The only metric that makes sense is probably host system load,&lt;br /&gt;
of which the only someone quantifiable component seems to be the CPU load.&lt;br /&gt;
Need take into account the fact that CPU speed might change&lt;br /&gt;
with time, so load should probably be in seconds&lt;br /&gt;
(%CPU/speed) rather than plain %CPU.&lt;br /&gt;
&lt;br /&gt;
== Networking setup ==&lt;br /&gt;
&lt;br /&gt;
== CPU setup ==&lt;br /&gt;
&lt;br /&gt;
== Guest setup ==&lt;br /&gt;
&lt;br /&gt;
== Hypervisor setup ==&lt;br /&gt;
&lt;br /&gt;
== Traffic load ==&lt;br /&gt;
&lt;br /&gt;
== Hardware configuration ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;mst&amp;gt; yes&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; can we let the perf team to do that?&lt;br /&gt;
&amp;lt;mst&amp;gt; they likely won&#039;t do it in time&lt;br /&gt;
&amp;lt;mst&amp;gt; I started making up a list of what we need to measure&lt;br /&gt;
&amp;lt;mst&amp;gt; have a bit of time to discuss?&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; you mean we need to do it ourself?&lt;br /&gt;
&amp;lt;mst&amp;gt; at least part of it&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; I&#039;m sorry, I need to attend the autotest meeting in 10 minutes&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; mst ok&lt;br /&gt;
&amp;lt;mst&amp;gt; will have time afterward?&lt;br /&gt;
&amp;lt;mst&amp;gt; I know it&#039;s late in your TZ&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; cool, then I&#039;ll stay connected on irc just ping me&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; thanks!&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; you are welcome&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; hi, just back from the meeting&lt;br /&gt;
&amp;lt;mst&amp;gt; hi&lt;br /&gt;
&amp;lt;mst&amp;gt; okay so let&#039;s see what we have&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; okay&lt;br /&gt;
&amp;lt;mst&amp;gt; first we have the various connection options&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; we can do:&lt;br /&gt;
&amp;lt;mst&amp;gt; host to guest&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to host&lt;br /&gt;
&amp;lt;mst&amp;gt; ext to guest&lt;br /&gt;
&amp;lt;mst&amp;gt; ext to host&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to guest on local&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to guest across the net&lt;br /&gt;
&amp;lt;mst&amp;gt; for comparison it&#039;s probably useful to do &amp;quot;baremetal&amp;quot;: loopback and external&amp;lt;-&amp;gt;host&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; a bit more advanced: bidirectional tests&lt;br /&gt;
&amp;lt;mst&amp;gt; many to many is probably to hard to setup&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, so we need only test some key options&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, for now let&#039;s focus on things that are easy to define&lt;br /&gt;
&amp;lt;mst&amp;gt; ok now what kind of traffic we care about&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; (ext)host to guest, guest to (ext)host ?&lt;br /&gt;
&amp;lt;mst&amp;gt; no I mean scheduler is heavily involved&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; so guest to guest on local is also needed?&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, think so&lt;br /&gt;
&amp;lt;mst&amp;gt; so I think we need to try just defaults&lt;br /&gt;
&amp;lt;mst&amp;gt; (no pinning)&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, that is usual case&lt;br /&gt;
&amp;lt;mst&amp;gt; as well as pinned scenario where qemu is pinned to cpus&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; and for external pinning irqs as well&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; set irq affinity?&lt;br /&gt;
&amp;lt;mst&amp;gt; do you know whether virsh let you pin the iothread?&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, affinity&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; no, I don&#039;t use virsh&lt;br /&gt;
&amp;lt;mst&amp;gt; need to find out, only pin what virsh let us pin&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; okay&lt;br /&gt;
&amp;lt;mst&amp;gt; note vhost-net thread is created on demand, so it is not very practical to pin it&lt;br /&gt;
&amp;lt;mst&amp;gt; if we do need this capability it will have to be added, I am hoping scheduler does the right thing&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it&#039;s a workqueue in RHEL6.1&lt;br /&gt;
&amp;lt;mst&amp;gt; workqueue is just a list + thread, or we can change it if we like&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; do you man if we need we can use a dedicated thread like upstream which is easy to be pinned?&lt;br /&gt;
&amp;lt;mst&amp;gt; upstream is not easier to be pinned&lt;br /&gt;
&amp;lt;mst&amp;gt; the issue is mostly that thread is only created on driver OK now&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; so guest can destroy it and recreate and it loses what you set&lt;br /&gt;
&amp;lt;mst&amp;gt; in benchmark it works but not for real users&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, agree&lt;br /&gt;
&amp;lt;mst&amp;gt; maybe cgroups can be used somehow since it inherits the cgroups of the owner&lt;br /&gt;
&amp;lt;mst&amp;gt; another option is to let qemu control the pinning&lt;br /&gt;
&amp;lt;mst&amp;gt; either let it specify the thread to do the work&lt;br /&gt;
&amp;lt;mst&amp;gt; or just add ioctl for pinning&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; looks possible&lt;br /&gt;
&amp;lt;mst&amp;gt; in mark wagner&#039;s tests it seemed to work well without&lt;br /&gt;
&amp;lt;mst&amp;gt; so need to see if it&#039;s needed, it&#039;s not hard to add this interface&lt;br /&gt;
&amp;lt;mst&amp;gt; but once we add it must maintain forever&lt;br /&gt;
&amp;lt;mst&amp;gt; so I think irq affinity and cpu pinning are two options to try tweaking&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, have saw some performance discussion of vhost upstream&lt;br /&gt;
&amp;lt;mst&amp;gt; need to make sure we try on a numa box&lt;br /&gt;
&amp;lt;mst&amp;gt; at the moment kernel structures are allocated on first use&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; I hope it all fits in cache so should not matter&lt;br /&gt;
&amp;lt;mst&amp;gt; but need to check, not yet sure what exactly&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, things would be more complicated when using numa&lt;br /&gt;
&amp;lt;mst&amp;gt; not sure what exactly are the configurations to check&lt;br /&gt;
&amp;lt;mst&amp;gt; ok so we have the network setup and we have the cpu setup&lt;br /&gt;
&amp;lt;mst&amp;gt; let thing is traffic to check&lt;br /&gt;
&amp;lt;mst&amp;gt; let-&amp;gt;last&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, TCP_STREAM/UDP_STREAM/TCP_RR and something else?&lt;br /&gt;
&amp;lt;mst&amp;gt; let&#039;s focus on the protocols first&lt;br /&gt;
&amp;lt;mst&amp;gt; so we can do TCP, this has a strange property of coalescing messages&lt;br /&gt;
&amp;lt;mst&amp;gt; but OTOH it&#039;s the most used protocol&lt;br /&gt;
&amp;lt;mst&amp;gt; and it has hard requirements e.g. on the ordering of packets&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, TCP must to be tested&lt;br /&gt;
&amp;lt;mst&amp;gt; UDP is only working well up to mtu packet size&lt;br /&gt;
&amp;lt;mst&amp;gt; but otherwise it let us do pretty low level stuff&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, agree&lt;br /&gt;
&amp;lt;mst&amp;gt; ICMP is very low level (good), has a disadvantage that it might be special-cased in hardware and software (bad)&lt;br /&gt;
&amp;lt;mst&amp;gt; what kind of traffic we care about? ideally a range of message sizes, and a range of loads&lt;br /&gt;
&amp;lt;mst&amp;gt; (in terms of messages per second)&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; what do we want to measure?&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; bandwidth and latency&lt;br /&gt;
&amp;lt;mst&amp;gt; I think this not really it&lt;br /&gt;
&amp;lt;mst&amp;gt; this is what tools like to give us&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes and maybe also the  cpu usage&lt;br /&gt;
&amp;lt;mst&amp;gt; if you think about it in terms of an application, it is always latency that you care about in the end&lt;br /&gt;
&amp;lt;mst&amp;gt; e.g. I have this huge file what is the latency to send it over the network&lt;br /&gt;
&amp;lt;mst&amp;gt; and for us also what is the cpu load, you are right&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; so for a given traffic, which we can approximate by setting message size (both ways) protocol and messages per second&lt;br /&gt;
&amp;lt;mst&amp;gt; we want to know the latency and the cpu load&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; and we want the peak e.g. we want to know how high we can go in messages per second until latencies become unreasonable&lt;br /&gt;
&amp;lt;mst&amp;gt; this last is a bit subjective&lt;br /&gt;
&amp;lt;mst&amp;gt; but generally any system would gadually become less responsive with more load&lt;br /&gt;
&amp;lt;mst&amp;gt; then at some point it just breaks&lt;br /&gt;
&amp;lt;mst&amp;gt; cou load is a bit hard to define&lt;br /&gt;
&amp;lt;mst&amp;gt; cpu&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes and it looks hard to do the measuring then&lt;br /&gt;
&amp;lt;mst&amp;gt; I think in the end, what we care about is how many cpu cycles the host burns&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, but how to measure that?&lt;br /&gt;
&amp;lt;mst&amp;gt; well we have simple things like /proc/stat&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; understood and maybe perf can also help&lt;br /&gt;
&amp;lt;mst&amp;gt; yes quite possibly&lt;br /&gt;
&amp;lt;mst&amp;gt; in other words we&#039;ll need to measure this in parallel while test is running&lt;br /&gt;
&amp;lt;mst&amp;gt; netperf can report local/remote CPU&lt;br /&gt;
&amp;lt;mst&amp;gt; but I do not understand what it really means&lt;br /&gt;
&amp;lt;mst&amp;gt; especially for a guest&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, if we want to use netperf it&#039;s better to know how it does the calculation&lt;br /&gt;
&amp;lt;mst&amp;gt; well it just looks at /proc/stat AFAIK&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, I try to take a look at its source&lt;br /&gt;
&amp;lt;mst&amp;gt; this is the default but it has other heuristics&lt;br /&gt;
&amp;lt;mst&amp;gt; that can be configured at compile time&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok, understand&lt;br /&gt;
&amp;lt;mst&amp;gt; ok and I think load divided by CPU is a useful metric&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; so the ideal result is to get how many cpu cycles does vhost spend on send or receive a KB&lt;br /&gt;
&amp;lt;mst&amp;gt; netperf can report service demand&lt;br /&gt;
&amp;lt;mst&amp;gt; I do not understand what it is&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; From its manual its how many us the cpu spend on a KB&lt;br /&gt;
&amp;lt;mst&amp;gt; well the answer will be it depends :)&lt;br /&gt;
&amp;lt;mst&amp;gt; also, we have packet loss&lt;br /&gt;
&amp;lt;mst&amp;gt; I think at some level we only care about packets that were delivered&lt;br /&gt;
&amp;lt;mst&amp;gt; so e.g. with UDP we only care about received messages&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, the packet loss may have concerns with guest drivers&lt;br /&gt;
&amp;lt;mst&amp;gt; with TCP if you look at messages, there&#039;s no loss&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes TCP have flow control itself&lt;br /&gt;
&amp;lt;mst&amp;gt; ok so let&#039;s see what tools we have&lt;br /&gt;
&amp;lt;mst&amp;gt; the simplest is flood ping&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it&#039;s very simple and easy to use&lt;br /&gt;
&amp;lt;mst&amp;gt; it gives you control over message size, packets per second, gets you back latency&lt;br /&gt;
&amp;lt;mst&amp;gt; it is always bidirectional I think&lt;br /&gt;
&amp;lt;mst&amp;gt; and we need to measure CPU ourselves&lt;br /&gt;
&amp;lt;mst&amp;gt; that last seems to be true anyway&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, maybe easy to be understand and analysis than netperf&lt;br /&gt;
&amp;lt;mst&amp;gt; packet loss when it occurs complicates things&lt;br /&gt;
&amp;lt;mst&amp;gt; e.g. with 50% packet loss the real load is anywhere in between&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; that&#039;s the only problem: it&#039;s always bidirectional so tx/rx problems are hard to separate&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, vhost is currently half-duplex&lt;br /&gt;
&amp;lt;mst&amp;gt; I am also not sure it detect reordering&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it has sequence no.&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; but for ping, as you&#039;ve said it&#039;s ICMP and was not the most of the cases&lt;br /&gt;
&amp;lt;mst&amp;gt; ok, next we have netperf&lt;br /&gt;
&amp;lt;mst&amp;gt; afaik it can do two things&lt;br /&gt;
&amp;lt;mst&amp;gt; it can try sending as many packets as it can&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; or it can send a single one back and forth&lt;br /&gt;
&amp;lt;mst&amp;gt; not a lot of data, but ok&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; and similar with UDP&lt;br /&gt;
&amp;lt;mst&amp;gt; got to go have lunch&lt;br /&gt;
&amp;lt;mst&amp;gt; So I will try and write all this up&lt;br /&gt;
&amp;lt;mst&amp;gt; do you have any hardware for testing?&lt;br /&gt;
&amp;lt;mst&amp;gt; if yes we&#039;ll add it too, I&#039;ll put up a wiki&lt;br /&gt;
&amp;lt;mst&amp;gt; back in half an hour&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, write all things up would help&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; go home now, please send me mail&lt;br /&gt;
* jasonwang has quit (Quit: Leaving)&lt;br /&gt;
 &lt;br /&gt;
* Loaded log from Wed Dec 15 15:07:24 2010&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingPerformanceTesting&amp;diff=3449</id>
		<title>NetworkingPerformanceTesting</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingPerformanceTesting&amp;diff=3449"/>
		<updated>2010-12-15T19:20:59Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Networking Performance Testing ==&lt;br /&gt;
This is a summary of performance acceptance criteria for changes in hypervisor virt networking. The matrix of configurations we are interested in is built combining possible options. Naturally the bigger a change the more exhaustive would we want the coverage to be.&lt;br /&gt;
&lt;br /&gt;
We can get different configurations by selecting different options in the following categories: [[#Networking setup|Networking setup]], [[#CPU setup|CPU setup]], [[#Guest setup|Guest setup]], [[#Traffic load|Traffic load]].&lt;br /&gt;
For each of these we are interested in a set of [[#Performance metrics|Performance metrics]].&lt;br /&gt;
A test would need to be performed under a controlled Hardware configuration,&lt;br /&gt;
for each relevant [[#Hypervisor setup|Hypervisor setup]] and/or [[#Guest setup|Guest setup]] (depending on which change is tested) on the same hardware.&lt;br /&gt;
Ideally we&#039;d note the [[#Hardware configuration|Hardware configuration]] and person performing the test to increase the chance it can be reproduced later.&lt;br /&gt;
&lt;br /&gt;
== Performance metrics ==&lt;br /&gt;
Generally for a given setup and traffic&lt;br /&gt;
we want to know the [[#Latency|Latency]] and the [[#CPU load|CPU load]].&lt;br /&gt;
We generally might care about minimal, average (or median) and maximum&lt;br /&gt;
latencies.&lt;br /&gt;
&lt;br /&gt;
Some derive metrics from this are:&lt;br /&gt;
- *peak throughput* i.e. how high we can go&lt;br /&gt;
  until latencies sharply become unreasonable&lt;br /&gt;
- *service demand*: load divided by CPU&lt;br /&gt;
&lt;br /&gt;
=== Latency ===&lt;br /&gt;
Latency is generally time until you get a response. For some workloads you don&#039;t measure latencies directly, instead you measure peak throughput.&lt;br /&gt;
&lt;br /&gt;
=== CPU load ===&lt;br /&gt;
The only metric that makes sense is probably host system load,&lt;br /&gt;
of which the only someone quantifiable component seems to be the CPU load.&lt;br /&gt;
Need take into account the fact that CPU speed might change&lt;br /&gt;
with time, so load should probably be in seconds&lt;br /&gt;
(%CPU/speed) rather than plain %CPU.&lt;br /&gt;
&lt;br /&gt;
== Networking setup ==&lt;br /&gt;
&lt;br /&gt;
== CPU setup ==&lt;br /&gt;
&lt;br /&gt;
== Guest setup ==&lt;br /&gt;
&lt;br /&gt;
== Hypervisor setup ==&lt;br /&gt;
&lt;br /&gt;
== Traffic load ==&lt;br /&gt;
&lt;br /&gt;
== Hardware configuration ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;mst&amp;gt; yes&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; can we let the perf team to do that?&lt;br /&gt;
&amp;lt;mst&amp;gt; they likely won&#039;t do it in time&lt;br /&gt;
&amp;lt;mst&amp;gt; I started making up a list of what we need to measure&lt;br /&gt;
&amp;lt;mst&amp;gt; have a bit of time to discuss?&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; you mean we need to do it ourself?&lt;br /&gt;
&amp;lt;mst&amp;gt; at least part of it&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; I&#039;m sorry, I need to attend the autotest meeting in 10 minutes&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; mst ok&lt;br /&gt;
&amp;lt;mst&amp;gt; will have time afterward?&lt;br /&gt;
&amp;lt;mst&amp;gt; I know it&#039;s late in your TZ&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; cool, then I&#039;ll stay connected on irc just ping me&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; thanks!&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; you are welcome&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; hi, just back from the meeting&lt;br /&gt;
&amp;lt;mst&amp;gt; hi&lt;br /&gt;
&amp;lt;mst&amp;gt; okay so let&#039;s see what we have&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; okay&lt;br /&gt;
&amp;lt;mst&amp;gt; first we have the various connection options&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; we can do:&lt;br /&gt;
&amp;lt;mst&amp;gt; host to guest&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to host&lt;br /&gt;
&amp;lt;mst&amp;gt; ext to guest&lt;br /&gt;
&amp;lt;mst&amp;gt; ext to host&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to guest on local&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to guest across the net&lt;br /&gt;
&amp;lt;mst&amp;gt; for comparison it&#039;s probably useful to do &amp;quot;baremetal&amp;quot;: loopback and external&amp;lt;-&amp;gt;host&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; a bit more advanced: bidirectional tests&lt;br /&gt;
&amp;lt;mst&amp;gt; many to many is probably to hard to setup&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, so we need only test some key options&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, for now let&#039;s focus on things that are easy to define&lt;br /&gt;
&amp;lt;mst&amp;gt; ok now what kind of traffic we care about&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; (ext)host to guest, guest to (ext)host ?&lt;br /&gt;
&amp;lt;mst&amp;gt; no I mean scheduler is heavily involved&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; so guest to guest on local is also needed?&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, think so&lt;br /&gt;
&amp;lt;mst&amp;gt; so I think we need to try just defaults&lt;br /&gt;
&amp;lt;mst&amp;gt; (no pinning)&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, that is usual case&lt;br /&gt;
&amp;lt;mst&amp;gt; as well as pinned scenario where qemu is pinned to cpus&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; and for external pinning irqs as well&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; set irq affinity?&lt;br /&gt;
&amp;lt;mst&amp;gt; do you know whether virsh let you pin the iothread?&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, affinity&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; no, I don&#039;t use virsh&lt;br /&gt;
&amp;lt;mst&amp;gt; need to find out, only pin what virsh let us pin&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; okay&lt;br /&gt;
&amp;lt;mst&amp;gt; note vhost-net thread is created on demand, so it is not very practical to pin it&lt;br /&gt;
&amp;lt;mst&amp;gt; if we do need this capability it will have to be added, I am hoping scheduler does the right thing&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it&#039;s a workqueue in RHEL6.1&lt;br /&gt;
&amp;lt;mst&amp;gt; workqueue is just a list + thread, or we can change it if we like&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; do you man if we need we can use a dedicated thread like upstream which is easy to be pinned?&lt;br /&gt;
&amp;lt;mst&amp;gt; upstream is not easier to be pinned&lt;br /&gt;
&amp;lt;mst&amp;gt; the issue is mostly that thread is only created on driver OK now&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; so guest can destroy it and recreate and it loses what you set&lt;br /&gt;
&amp;lt;mst&amp;gt; in benchmark it works but not for real users&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, agree&lt;br /&gt;
&amp;lt;mst&amp;gt; maybe cgroups can be used somehow since it inherits the cgroups of the owner&lt;br /&gt;
&amp;lt;mst&amp;gt; another option is to let qemu control the pinning&lt;br /&gt;
&amp;lt;mst&amp;gt; either let it specify the thread to do the work&lt;br /&gt;
&amp;lt;mst&amp;gt; or just add ioctl for pinning&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; looks possible&lt;br /&gt;
&amp;lt;mst&amp;gt; in mark wagner&#039;s tests it seemed to work well without&lt;br /&gt;
&amp;lt;mst&amp;gt; so need to see if it&#039;s needed, it&#039;s not hard to add this interface&lt;br /&gt;
&amp;lt;mst&amp;gt; but once we add it must maintain forever&lt;br /&gt;
&amp;lt;mst&amp;gt; so I think irq affinity and cpu pinning are two options to try tweaking&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, have saw some performance discussion of vhost upstream&lt;br /&gt;
&amp;lt;mst&amp;gt; need to make sure we try on a numa box&lt;br /&gt;
&amp;lt;mst&amp;gt; at the moment kernel structures are allocated on first use&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; I hope it all fits in cache so should not matter&lt;br /&gt;
&amp;lt;mst&amp;gt; but need to check, not yet sure what exactly&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, things would be more complicated when using numa&lt;br /&gt;
&amp;lt;mst&amp;gt; not sure what exactly are the configurations to check&lt;br /&gt;
&amp;lt;mst&amp;gt; ok so we have the network setup and we have the cpu setup&lt;br /&gt;
&amp;lt;mst&amp;gt; let thing is traffic to check&lt;br /&gt;
&amp;lt;mst&amp;gt; let-&amp;gt;last&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, TCP_STREAM/UDP_STREAM/TCP_RR and something else?&lt;br /&gt;
&amp;lt;mst&amp;gt; let&#039;s focus on the protocols first&lt;br /&gt;
&amp;lt;mst&amp;gt; so we can do TCP, this has a strange property of coalescing messages&lt;br /&gt;
&amp;lt;mst&amp;gt; but OTOH it&#039;s the most used protocol&lt;br /&gt;
&amp;lt;mst&amp;gt; and it has hard requirements e.g. on the ordering of packets&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, TCP must to be tested&lt;br /&gt;
&amp;lt;mst&amp;gt; UDP is only working well up to mtu packet size&lt;br /&gt;
&amp;lt;mst&amp;gt; but otherwise it let us do pretty low level stuff&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, agree&lt;br /&gt;
&amp;lt;mst&amp;gt; ICMP is very low level (good), has a disadvantage that it might be special-cased in hardware and software (bad)&lt;br /&gt;
&amp;lt;mst&amp;gt; what kind of traffic we care about? ideally a range of message sizes, and a range of loads&lt;br /&gt;
&amp;lt;mst&amp;gt; (in terms of messages per second)&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; what do we want to measure?&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; bandwidth and latency&lt;br /&gt;
&amp;lt;mst&amp;gt; I think this not really it&lt;br /&gt;
&amp;lt;mst&amp;gt; this is what tools like to give us&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes and maybe also the  cpu usage&lt;br /&gt;
&amp;lt;mst&amp;gt; if you think about it in terms of an application, it is always latency that you care about in the end&lt;br /&gt;
&amp;lt;mst&amp;gt; e.g. I have this huge file what is the latency to send it over the network&lt;br /&gt;
&amp;lt;mst&amp;gt; and for us also what is the cpu load, you are right&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; so for a given traffic, which we can approximate by setting message size (both ways) protocol and messages per second&lt;br /&gt;
&amp;lt;mst&amp;gt; we want to know the latency and the cpu load&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; and we want the peak e.g. we want to know how high we can go in messages per second until latencies become unreasonable&lt;br /&gt;
&amp;lt;mst&amp;gt; this last is a bit subjective&lt;br /&gt;
&amp;lt;mst&amp;gt; but generally any system would gadually become less responsive with more load&lt;br /&gt;
&amp;lt;mst&amp;gt; then at some point it just breaks&lt;br /&gt;
&amp;lt;mst&amp;gt; cou load is a bit hard to define&lt;br /&gt;
&amp;lt;mst&amp;gt; cpu&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes and it looks hard to do the measuring then&lt;br /&gt;
&amp;lt;mst&amp;gt; I think in the end, what we care about is how many cpu cycles the host burns&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, but how to measure that?&lt;br /&gt;
&amp;lt;mst&amp;gt; well we have simple things like /proc/stat&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; understood and maybe perf can also help&lt;br /&gt;
&amp;lt;mst&amp;gt; yes quite possibly&lt;br /&gt;
&amp;lt;mst&amp;gt; in other words we&#039;ll need to measure this in parallel while test is running&lt;br /&gt;
&amp;lt;mst&amp;gt; netperf can report local/remote CPU&lt;br /&gt;
&amp;lt;mst&amp;gt; but I do not understand what it really means&lt;br /&gt;
&amp;lt;mst&amp;gt; especially for a guest&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, if we want to use netperf it&#039;s better to know how it does the calculation&lt;br /&gt;
&amp;lt;mst&amp;gt; well it just looks at /proc/stat AFAIK&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, I try to take a look at its source&lt;br /&gt;
&amp;lt;mst&amp;gt; this is the default but it has other heuristics&lt;br /&gt;
&amp;lt;mst&amp;gt; that can be configured at compile time&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok, understand&lt;br /&gt;
&amp;lt;mst&amp;gt; ok and I think load divided by CPU is a useful metric&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; so the ideal result is to get how many cpu cycles does vhost spend on send or receive a KB&lt;br /&gt;
&amp;lt;mst&amp;gt; netperf can report service demand&lt;br /&gt;
&amp;lt;mst&amp;gt; I do not understand what it is&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; From its manual its how many us the cpu spend on a KB&lt;br /&gt;
&amp;lt;mst&amp;gt; well the answer will be it depends :)&lt;br /&gt;
&amp;lt;mst&amp;gt; also, we have packet loss&lt;br /&gt;
&amp;lt;mst&amp;gt; I think at some level we only care about packets that were delivered&lt;br /&gt;
&amp;lt;mst&amp;gt; so e.g. with UDP we only care about received messages&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, the packet loss may have concerns with guest drivers&lt;br /&gt;
&amp;lt;mst&amp;gt; with TCP if you look at messages, there&#039;s no loss&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes TCP have flow control itself&lt;br /&gt;
&amp;lt;mst&amp;gt; ok so let&#039;s see what tools we have&lt;br /&gt;
&amp;lt;mst&amp;gt; the simplest is flood ping&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it&#039;s very simple and easy to use&lt;br /&gt;
&amp;lt;mst&amp;gt; it gives you control over message size, packets per second, gets you back latency&lt;br /&gt;
&amp;lt;mst&amp;gt; it is always bidirectional I think&lt;br /&gt;
&amp;lt;mst&amp;gt; and we need to measure CPU ourselves&lt;br /&gt;
&amp;lt;mst&amp;gt; that last seems to be true anyway&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, maybe easy to be understand and analysis than netperf&lt;br /&gt;
&amp;lt;mst&amp;gt; packet loss when it occurs complicates things&lt;br /&gt;
&amp;lt;mst&amp;gt; e.g. with 50% packet loss the real load is anywhere in between&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; that&#039;s the only problem: it&#039;s always bidirectional so tx/rx problems are hard to separate&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, vhost is currently half-duplex&lt;br /&gt;
&amp;lt;mst&amp;gt; I am also not sure it detect reordering&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it has sequence no.&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; but for ping, as you&#039;ve said it&#039;s ICMP and was not the most of the cases&lt;br /&gt;
&amp;lt;mst&amp;gt; ok, next we have netperf&lt;br /&gt;
&amp;lt;mst&amp;gt; afaik it can do two things&lt;br /&gt;
&amp;lt;mst&amp;gt; it can try sending as many packets as it can&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; or it can send a single one back and forth&lt;br /&gt;
&amp;lt;mst&amp;gt; not a lot of data, but ok&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; and similar with UDP&lt;br /&gt;
&amp;lt;mst&amp;gt; got to go have lunch&lt;br /&gt;
&amp;lt;mst&amp;gt; So I will try and write all this up&lt;br /&gt;
&amp;lt;mst&amp;gt; do you have any hardware for testing?&lt;br /&gt;
&amp;lt;mst&amp;gt; if yes we&#039;ll add it too, I&#039;ll put up a wiki&lt;br /&gt;
&amp;lt;mst&amp;gt; back in half an hour&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, write all things up would help&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; go home now, please send me mail&lt;br /&gt;
* jasonwang has quit (Quit: Leaving)&lt;br /&gt;
 &lt;br /&gt;
* Loaded log from Wed Dec 15 15:07:24 2010&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingPerformanceTesting&amp;diff=3448</id>
		<title>NetworkingPerformanceTesting</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingPerformanceTesting&amp;diff=3448"/>
		<updated>2010-12-15T17:29:37Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Networking Performance Testing ==&lt;br /&gt;
This is a summary of performance acceptance criteria for changes in hypervisor virt networking. The matrix of configurations we are interested in is built combining possible options. Naturally the bigger a change the more exhaustive would we want the coverage to be.&lt;br /&gt;
&lt;br /&gt;
We can get different configurations by selecting different options in the following categories: [[#Networking setup|Networking setup]], [[#CPU setup|CPU setup]], [[#Guest setup|Guest setup]], [[#Traffic load|Traffic load]].&lt;br /&gt;
For each of these we are interested in a set of [[#Performance metrics|Performance metrics]].&lt;br /&gt;
A test would need to be performed under a controlled Hardware configuration,&lt;br /&gt;
for each relevant [[#Hypervisor setup|Hypervisor setup]] and/or [[#Guest setup|Guest setup]] (depending on which change is tested) on the same hardware.&lt;br /&gt;
Ideally we&#039;d note the [[#Hardware configuration|Hardware configuration]] and person performing the test to increase the chance it can be reproduced later.&lt;br /&gt;
&lt;br /&gt;
== Performance metrics ==&lt;br /&gt;
&lt;br /&gt;
== Networking setup ==&lt;br /&gt;
&lt;br /&gt;
== CPU setup ==&lt;br /&gt;
&lt;br /&gt;
== Guest setup ==&lt;br /&gt;
&lt;br /&gt;
== Hypervisor setup ==&lt;br /&gt;
&lt;br /&gt;
== Traffic load ==&lt;br /&gt;
&lt;br /&gt;
== Hardware configuration ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;mst&amp;gt; yes&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; can we let the perf team to do that?&lt;br /&gt;
&amp;lt;mst&amp;gt; they likely won&#039;t do it in time&lt;br /&gt;
&amp;lt;mst&amp;gt; I started making up a list of what we need to measure&lt;br /&gt;
&amp;lt;mst&amp;gt; have a bit of time to discuss?&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; you mean we need to do it ourself?&lt;br /&gt;
&amp;lt;mst&amp;gt; at least part of it&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; I&#039;m sorry, I need to attend the autotest meeting in 10 minutes&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; mst ok&lt;br /&gt;
&amp;lt;mst&amp;gt; will have time afterward?&lt;br /&gt;
&amp;lt;mst&amp;gt; I know it&#039;s late in your TZ&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; cool, then I&#039;ll stay connected on irc just ping me&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; thanks!&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; you are welcome&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; hi, just back from the meeting&lt;br /&gt;
&amp;lt;mst&amp;gt; hi&lt;br /&gt;
&amp;lt;mst&amp;gt; okay so let&#039;s see what we have&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; okay&lt;br /&gt;
&amp;lt;mst&amp;gt; first we have the various connection options&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; we can do:&lt;br /&gt;
&amp;lt;mst&amp;gt; host to guest&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to host&lt;br /&gt;
&amp;lt;mst&amp;gt; ext to guest&lt;br /&gt;
&amp;lt;mst&amp;gt; ext to host&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to guest on local&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to guest across the net&lt;br /&gt;
&amp;lt;mst&amp;gt; for comparison it&#039;s probably useful to do &amp;quot;baremetal&amp;quot;: loopback and external&amp;lt;-&amp;gt;host&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; a bit more advanced: bidirectional tests&lt;br /&gt;
&amp;lt;mst&amp;gt; many to many is probably to hard to setup&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, so we need only test some key options&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, for now let&#039;s focus on things that are easy to define&lt;br /&gt;
&amp;lt;mst&amp;gt; ok now what kind of traffic we care about&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; (ext)host to guest, guest to (ext)host ?&lt;br /&gt;
&amp;lt;mst&amp;gt; no I mean scheduler is heavily involved&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; so guest to guest on local is also needed?&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, think so&lt;br /&gt;
&amp;lt;mst&amp;gt; so I think we need to try just defaults&lt;br /&gt;
&amp;lt;mst&amp;gt; (no pinning)&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, that is usual case&lt;br /&gt;
&amp;lt;mst&amp;gt; as well as pinned scenario where qemu is pinned to cpus&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; and for external pinning irqs as well&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; set irq affinity?&lt;br /&gt;
&amp;lt;mst&amp;gt; do you know whether virsh let you pin the iothread?&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, affinity&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; no, I don&#039;t use virsh&lt;br /&gt;
&amp;lt;mst&amp;gt; need to find out, only pin what virsh let us pin&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; okay&lt;br /&gt;
&amp;lt;mst&amp;gt; note vhost-net thread is created on demand, so it is not very practical to pin it&lt;br /&gt;
&amp;lt;mst&amp;gt; if we do need this capability it will have to be added, I am hoping scheduler does the right thing&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it&#039;s a workqueue in RHEL6.1&lt;br /&gt;
&amp;lt;mst&amp;gt; workqueue is just a list + thread, or we can change it if we like&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; do you man if we need we can use a dedicated thread like upstream which is easy to be pinned?&lt;br /&gt;
&amp;lt;mst&amp;gt; upstream is not easier to be pinned&lt;br /&gt;
&amp;lt;mst&amp;gt; the issue is mostly that thread is only created on driver OK now&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; so guest can destroy it and recreate and it loses what you set&lt;br /&gt;
&amp;lt;mst&amp;gt; in benchmark it works but not for real users&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, agree&lt;br /&gt;
&amp;lt;mst&amp;gt; maybe cgroups can be used somehow since it inherits the cgroups of the owner&lt;br /&gt;
&amp;lt;mst&amp;gt; another option is to let qemu control the pinning&lt;br /&gt;
&amp;lt;mst&amp;gt; either let it specify the thread to do the work&lt;br /&gt;
&amp;lt;mst&amp;gt; or just add ioctl for pinning&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; looks possible&lt;br /&gt;
&amp;lt;mst&amp;gt; in mark wagner&#039;s tests it seemed to work well without&lt;br /&gt;
&amp;lt;mst&amp;gt; so need to see if it&#039;s needed, it&#039;s not hard to add this interface&lt;br /&gt;
&amp;lt;mst&amp;gt; but once we add it must maintain forever&lt;br /&gt;
&amp;lt;mst&amp;gt; so I think irq affinity and cpu pinning are two options to try tweaking&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, have saw some performance discussion of vhost upstream&lt;br /&gt;
&amp;lt;mst&amp;gt; need to make sure we try on a numa box&lt;br /&gt;
&amp;lt;mst&amp;gt; at the moment kernel structures are allocated on first use&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; I hope it all fits in cache so should not matter&lt;br /&gt;
&amp;lt;mst&amp;gt; but need to check, not yet sure what exactly&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, things would be more complicated when using numa&lt;br /&gt;
&amp;lt;mst&amp;gt; not sure what exactly are the configurations to check&lt;br /&gt;
&amp;lt;mst&amp;gt; ok so we have the network setup and we have the cpu setup&lt;br /&gt;
&amp;lt;mst&amp;gt; let thing is traffic to check&lt;br /&gt;
&amp;lt;mst&amp;gt; let-&amp;gt;last&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, TCP_STREAM/UDP_STREAM/TCP_RR and something else?&lt;br /&gt;
&amp;lt;mst&amp;gt; let&#039;s focus on the protocols first&lt;br /&gt;
&amp;lt;mst&amp;gt; so we can do TCP, this has a strange property of coalescing messages&lt;br /&gt;
&amp;lt;mst&amp;gt; but OTOH it&#039;s the most used protocol&lt;br /&gt;
&amp;lt;mst&amp;gt; and it has hard requirements e.g. on the ordering of packets&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, TCP must to be tested&lt;br /&gt;
&amp;lt;mst&amp;gt; UDP is only working well up to mtu packet size&lt;br /&gt;
&amp;lt;mst&amp;gt; but otherwise it let us do pretty low level stuff&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, agree&lt;br /&gt;
&amp;lt;mst&amp;gt; ICMP is very low level (good), has a disadvantage that it might be special-cased in hardware and software (bad)&lt;br /&gt;
&amp;lt;mst&amp;gt; what kind of traffic we care about? ideally a range of message sizes, and a range of loads&lt;br /&gt;
&amp;lt;mst&amp;gt; (in terms of messages per second)&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; what do we want to measure?&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; bandwidth and latency&lt;br /&gt;
&amp;lt;mst&amp;gt; I think this not really it&lt;br /&gt;
&amp;lt;mst&amp;gt; this is what tools like to give us&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes and maybe also the  cpu usage&lt;br /&gt;
&amp;lt;mst&amp;gt; if you think about it in terms of an application, it is always latency that you care about in the end&lt;br /&gt;
&amp;lt;mst&amp;gt; e.g. I have this huge file what is the latency to send it over the network&lt;br /&gt;
&amp;lt;mst&amp;gt; and for us also what is the cpu load, you are right&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; so for a given traffic, which we can approximate by setting message size (both ways) protocol and messages per second&lt;br /&gt;
&amp;lt;mst&amp;gt; we want to know the latency and the cpu load&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; and we want the peak e.g. we want to know how high we can go in messages per second until latencies become unreasonable&lt;br /&gt;
&amp;lt;mst&amp;gt; this last is a bit subjective&lt;br /&gt;
&amp;lt;mst&amp;gt; but generally any system would gadually become less responsive with more load&lt;br /&gt;
&amp;lt;mst&amp;gt; then at some point it just breaks&lt;br /&gt;
&amp;lt;mst&amp;gt; cou load is a bit hard to define&lt;br /&gt;
&amp;lt;mst&amp;gt; cpu&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes and it looks hard to do the measuring then&lt;br /&gt;
&amp;lt;mst&amp;gt; I think in the end, what we care about is how many cpu cycles the host burns&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, but how to measure that?&lt;br /&gt;
&amp;lt;mst&amp;gt; well we have simple things like /proc/stat&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; understood and maybe perf can also help&lt;br /&gt;
&amp;lt;mst&amp;gt; yes quite possibly&lt;br /&gt;
&amp;lt;mst&amp;gt; in other words we&#039;ll need to measure this in parallel while test is running&lt;br /&gt;
&amp;lt;mst&amp;gt; netperf can report local/remote CPU&lt;br /&gt;
&amp;lt;mst&amp;gt; but I do not understand what it really means&lt;br /&gt;
&amp;lt;mst&amp;gt; especially for a guest&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, if we want to use netperf it&#039;s better to know how it does the calculation&lt;br /&gt;
&amp;lt;mst&amp;gt; well it just looks at /proc/stat AFAIK&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, I try to take a look at its source&lt;br /&gt;
&amp;lt;mst&amp;gt; this is the default but it has other heuristics&lt;br /&gt;
&amp;lt;mst&amp;gt; that can be configured at compile time&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok, understand&lt;br /&gt;
&amp;lt;mst&amp;gt; ok and I think load divided by CPU is a useful metric&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; so the ideal result is to get how many cpu cycles does vhost spend on send or receive a KB&lt;br /&gt;
&amp;lt;mst&amp;gt; netperf can report service demand&lt;br /&gt;
&amp;lt;mst&amp;gt; I do not understand what it is&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; From its manual its how many us the cpu spend on a KB&lt;br /&gt;
&amp;lt;mst&amp;gt; well the answer will be it depends :)&lt;br /&gt;
&amp;lt;mst&amp;gt; also, we have packet loss&lt;br /&gt;
&amp;lt;mst&amp;gt; I think at some level we only care about packets that were delivered&lt;br /&gt;
&amp;lt;mst&amp;gt; so e.g. with UDP we only care about received messages&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, the packet loss may have concerns with guest drivers&lt;br /&gt;
&amp;lt;mst&amp;gt; with TCP if you look at messages, there&#039;s no loss&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes TCP have flow control itself&lt;br /&gt;
&amp;lt;mst&amp;gt; ok so let&#039;s see what tools we have&lt;br /&gt;
&amp;lt;mst&amp;gt; the simplest is flood ping&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it&#039;s very simple and easy to use&lt;br /&gt;
&amp;lt;mst&amp;gt; it gives you control over message size, packets per second, gets you back latency&lt;br /&gt;
&amp;lt;mst&amp;gt; it is always bidirectional I think&lt;br /&gt;
&amp;lt;mst&amp;gt; and we need to measure CPU ourselves&lt;br /&gt;
&amp;lt;mst&amp;gt; that last seems to be true anyway&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, maybe easy to be understand and analysis than netperf&lt;br /&gt;
&amp;lt;mst&amp;gt; packet loss when it occurs complicates things&lt;br /&gt;
&amp;lt;mst&amp;gt; e.g. with 50% packet loss the real load is anywhere in between&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; that&#039;s the only problem: it&#039;s always bidirectional so tx/rx problems are hard to separate&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, vhost is currently half-duplex&lt;br /&gt;
&amp;lt;mst&amp;gt; I am also not sure it detect reordering&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it has sequence no.&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; but for ping, as you&#039;ve said it&#039;s ICMP and was not the most of the cases&lt;br /&gt;
&amp;lt;mst&amp;gt; ok, next we have netperf&lt;br /&gt;
&amp;lt;mst&amp;gt; afaik it can do two things&lt;br /&gt;
&amp;lt;mst&amp;gt; it can try sending as many packets as it can&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; or it can send a single one back and forth&lt;br /&gt;
&amp;lt;mst&amp;gt; not a lot of data, but ok&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; and similar with UDP&lt;br /&gt;
&amp;lt;mst&amp;gt; got to go have lunch&lt;br /&gt;
&amp;lt;mst&amp;gt; So I will try and write all this up&lt;br /&gt;
&amp;lt;mst&amp;gt; do you have any hardware for testing?&lt;br /&gt;
&amp;lt;mst&amp;gt; if yes we&#039;ll add it too, I&#039;ll put up a wiki&lt;br /&gt;
&amp;lt;mst&amp;gt; back in half an hour&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, write all things up would help&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; go home now, please send me mail&lt;br /&gt;
* jasonwang has quit (Quit: Leaving)&lt;br /&gt;
 &lt;br /&gt;
* Loaded log from Wed Dec 15 15:07:24 2010&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingPerformanceTesting&amp;diff=3447</id>
		<title>NetworkingPerformanceTesting</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingPerformanceTesting&amp;diff=3447"/>
		<updated>2010-12-15T17:25:50Z</updated>

		<summary type="html">&lt;p&gt;Mst: headers filed in&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Networking Performance Testing ==&lt;br /&gt;
This is a summary of performance acceptance criteria for changes in hypervisor virt networking. The matrix of configurations we are interested in is built combining possible options. Naturally the bigger a change the more exhaustive would we want the coverage to be.&lt;br /&gt;
&lt;br /&gt;
We can get different configurations by selecting different options in the following categories: [[#Networking setup]], [[#CPU setup], [[#Guest setup]], [[#Traffic load]].&lt;br /&gt;
For each of these we are interested in a set of [[#Performance metrics]].&lt;br /&gt;
A test would need to be performed under a controlled Hardware configuration,&lt;br /&gt;
for each relevant [[#Hypervisor setup]] and/or [[#Guest setup]] (depending on which change is tested) on the same hardware.&lt;br /&gt;
Ideally we&#039;d note the [[#Hardware configuration]] and person performing the test to increase the chance it can be reproduced later.&lt;br /&gt;
&lt;br /&gt;
== Performance metrics ==&lt;br /&gt;
&lt;br /&gt;
== Networking setup ==&lt;br /&gt;
&lt;br /&gt;
== CPU setup ==&lt;br /&gt;
&lt;br /&gt;
== Guest setup ==&lt;br /&gt;
&lt;br /&gt;
== Hypervisor setup ==&lt;br /&gt;
&lt;br /&gt;
== Traffic load ==&lt;br /&gt;
&lt;br /&gt;
== Hardware configuration ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;mst&amp;gt; yes&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; can we let the perf team to do that?&lt;br /&gt;
&amp;lt;mst&amp;gt; they likely won&#039;t do it in time&lt;br /&gt;
&amp;lt;mst&amp;gt; I started making up a list of what we need to measure&lt;br /&gt;
&amp;lt;mst&amp;gt; have a bit of time to discuss?&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; you mean we need to do it ourself?&lt;br /&gt;
&amp;lt;mst&amp;gt; at least part of it&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; I&#039;m sorry, I need to attend the autotest meeting in 10 minutes&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; mst ok&lt;br /&gt;
&amp;lt;mst&amp;gt; will have time afterward?&lt;br /&gt;
&amp;lt;mst&amp;gt; I know it&#039;s late in your TZ&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; cool, then I&#039;ll stay connected on irc just ping me&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; thanks!&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; you are welcome&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; hi, just back from the meeting&lt;br /&gt;
&amp;lt;mst&amp;gt; hi&lt;br /&gt;
&amp;lt;mst&amp;gt; okay so let&#039;s see what we have&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; okay&lt;br /&gt;
&amp;lt;mst&amp;gt; first we have the various connection options&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; we can do:&lt;br /&gt;
&amp;lt;mst&amp;gt; host to guest&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to host&lt;br /&gt;
&amp;lt;mst&amp;gt; ext to guest&lt;br /&gt;
&amp;lt;mst&amp;gt; ext to host&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to guest on local&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; guest to guest across the net&lt;br /&gt;
&amp;lt;mst&amp;gt; for comparison it&#039;s probably useful to do &amp;quot;baremetal&amp;quot;: loopback and external&amp;lt;-&amp;gt;host&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; a bit more advanced: bidirectional tests&lt;br /&gt;
&amp;lt;mst&amp;gt; many to many is probably to hard to setup&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, so we need only test some key options&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, for now let&#039;s focus on things that are easy to define&lt;br /&gt;
&amp;lt;mst&amp;gt; ok now what kind of traffic we care about&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; (ext)host to guest, guest to (ext)host ?&lt;br /&gt;
&amp;lt;mst&amp;gt; no I mean scheduler is heavily involved&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; so guest to guest on local is also needed?&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, think so&lt;br /&gt;
&amp;lt;mst&amp;gt; so I think we need to try just defaults&lt;br /&gt;
&amp;lt;mst&amp;gt; (no pinning)&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, that is usual case&lt;br /&gt;
&amp;lt;mst&amp;gt; as well as pinned scenario where qemu is pinned to cpus&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok&lt;br /&gt;
&amp;lt;mst&amp;gt; and for external pinning irqs as well&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; set irq affinity?&lt;br /&gt;
&amp;lt;mst&amp;gt; do you know whether virsh let you pin the iothread?&lt;br /&gt;
&amp;lt;mst&amp;gt; yes, affinity&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; no, I don&#039;t use virsh&lt;br /&gt;
&amp;lt;mst&amp;gt; need to find out, only pin what virsh let us pin&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; okay&lt;br /&gt;
&amp;lt;mst&amp;gt; note vhost-net thread is created on demand, so it is not very practical to pin it&lt;br /&gt;
&amp;lt;mst&amp;gt; if we do need this capability it will have to be added, I am hoping scheduler does the right thing&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it&#039;s a workqueue in RHEL6.1&lt;br /&gt;
&amp;lt;mst&amp;gt; workqueue is just a list + thread, or we can change it if we like&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; do you man if we need we can use a dedicated thread like upstream which is easy to be pinned?&lt;br /&gt;
&amp;lt;mst&amp;gt; upstream is not easier to be pinned&lt;br /&gt;
&amp;lt;mst&amp;gt; the issue is mostly that thread is only created on driver OK now&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; so guest can destroy it and recreate and it loses what you set&lt;br /&gt;
&amp;lt;mst&amp;gt; in benchmark it works but not for real users&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, agree&lt;br /&gt;
&amp;lt;mst&amp;gt; maybe cgroups can be used somehow since it inherits the cgroups of the owner&lt;br /&gt;
&amp;lt;mst&amp;gt; another option is to let qemu control the pinning&lt;br /&gt;
&amp;lt;mst&amp;gt; either let it specify the thread to do the work&lt;br /&gt;
&amp;lt;mst&amp;gt; or just add ioctl for pinning&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; looks possible&lt;br /&gt;
&amp;lt;mst&amp;gt; in mark wagner&#039;s tests it seemed to work well without&lt;br /&gt;
&amp;lt;mst&amp;gt; so need to see if it&#039;s needed, it&#039;s not hard to add this interface&lt;br /&gt;
&amp;lt;mst&amp;gt; but once we add it must maintain forever&lt;br /&gt;
&amp;lt;mst&amp;gt; so I think irq affinity and cpu pinning are two options to try tweaking&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, have saw some performance discussion of vhost upstream&lt;br /&gt;
&amp;lt;mst&amp;gt; need to make sure we try on a numa box&lt;br /&gt;
&amp;lt;mst&amp;gt; at the moment kernel structures are allocated on first use&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; I hope it all fits in cache so should not matter&lt;br /&gt;
&amp;lt;mst&amp;gt; but need to check, not yet sure what exactly&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, things would be more complicated when using numa&lt;br /&gt;
&amp;lt;mst&amp;gt; not sure what exactly are the configurations to check&lt;br /&gt;
&amp;lt;mst&amp;gt; ok so we have the network setup and we have the cpu setup&lt;br /&gt;
&amp;lt;mst&amp;gt; let thing is traffic to check&lt;br /&gt;
&amp;lt;mst&amp;gt; let-&amp;gt;last&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, TCP_STREAM/UDP_STREAM/TCP_RR and something else?&lt;br /&gt;
&amp;lt;mst&amp;gt; let&#039;s focus on the protocols first&lt;br /&gt;
&amp;lt;mst&amp;gt; so we can do TCP, this has a strange property of coalescing messages&lt;br /&gt;
&amp;lt;mst&amp;gt; but OTOH it&#039;s the most used protocol&lt;br /&gt;
&amp;lt;mst&amp;gt; and it has hard requirements e.g. on the ordering of packets&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, TCP must to be tested&lt;br /&gt;
&amp;lt;mst&amp;gt; UDP is only working well up to mtu packet size&lt;br /&gt;
&amp;lt;mst&amp;gt; but otherwise it let us do pretty low level stuff&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, agree&lt;br /&gt;
&amp;lt;mst&amp;gt; ICMP is very low level (good), has a disadvantage that it might be special-cased in hardware and software (bad)&lt;br /&gt;
&amp;lt;mst&amp;gt; what kind of traffic we care about? ideally a range of message sizes, and a range of loads&lt;br /&gt;
&amp;lt;mst&amp;gt; (in terms of messages per second)&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; what do we want to measure?&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; bandwidth and latency&lt;br /&gt;
&amp;lt;mst&amp;gt; I think this not really it&lt;br /&gt;
&amp;lt;mst&amp;gt; this is what tools like to give us&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes and maybe also the  cpu usage&lt;br /&gt;
&amp;lt;mst&amp;gt; if you think about it in terms of an application, it is always latency that you care about in the end&lt;br /&gt;
&amp;lt;mst&amp;gt; e.g. I have this huge file what is the latency to send it over the network&lt;br /&gt;
&amp;lt;mst&amp;gt; and for us also what is the cpu load, you are right&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; so for a given traffic, which we can approximate by setting message size (both ways) protocol and messages per second&lt;br /&gt;
&amp;lt;mst&amp;gt; we want to know the latency and the cpu load&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; and we want the peak e.g. we want to know how high we can go in messages per second until latencies become unreasonable&lt;br /&gt;
&amp;lt;mst&amp;gt; this last is a bit subjective&lt;br /&gt;
&amp;lt;mst&amp;gt; but generally any system would gadually become less responsive with more load&lt;br /&gt;
&amp;lt;mst&amp;gt; then at some point it just breaks&lt;br /&gt;
&amp;lt;mst&amp;gt; cou load is a bit hard to define&lt;br /&gt;
&amp;lt;mst&amp;gt; cpu&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes and it looks hard to do the measuring then&lt;br /&gt;
&amp;lt;mst&amp;gt; I think in the end, what we care about is how many cpu cycles the host burns&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, but how to measure that?&lt;br /&gt;
&amp;lt;mst&amp;gt; well we have simple things like /proc/stat&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; understood and maybe perf can also help&lt;br /&gt;
&amp;lt;mst&amp;gt; yes quite possibly&lt;br /&gt;
&amp;lt;mst&amp;gt; in other words we&#039;ll need to measure this in parallel while test is running&lt;br /&gt;
&amp;lt;mst&amp;gt; netperf can report local/remote CPU&lt;br /&gt;
&amp;lt;mst&amp;gt; but I do not understand what it really means&lt;br /&gt;
&amp;lt;mst&amp;gt; especially for a guest&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, if we want to use netperf it&#039;s better to know how it does the calculation&lt;br /&gt;
&amp;lt;mst&amp;gt; well it just looks at /proc/stat AFAIK&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, I try to take a look at its source&lt;br /&gt;
&amp;lt;mst&amp;gt; this is the default but it has other heuristics&lt;br /&gt;
&amp;lt;mst&amp;gt; that can be configured at compile time&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; ok, understand&lt;br /&gt;
&amp;lt;mst&amp;gt; ok and I think load divided by CPU is a useful metric&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; so the ideal result is to get how many cpu cycles does vhost spend on send or receive a KB&lt;br /&gt;
&amp;lt;mst&amp;gt; netperf can report service demand&lt;br /&gt;
&amp;lt;mst&amp;gt; I do not understand what it is&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; From its manual its how many us the cpu spend on a KB&lt;br /&gt;
&amp;lt;mst&amp;gt; well the answer will be it depends :)&lt;br /&gt;
&amp;lt;mst&amp;gt; also, we have packet loss&lt;br /&gt;
&amp;lt;mst&amp;gt; I think at some level we only care about packets that were delivered&lt;br /&gt;
&amp;lt;mst&amp;gt; so e.g. with UDP we only care about received messages&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, the packet loss may have concerns with guest drivers&lt;br /&gt;
&amp;lt;mst&amp;gt; with TCP if you look at messages, there&#039;s no loss&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes TCP have flow control itself&lt;br /&gt;
&amp;lt;mst&amp;gt; ok so let&#039;s see what tools we have&lt;br /&gt;
&amp;lt;mst&amp;gt; the simplest is flood ping&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it&#039;s very simple and easy to use&lt;br /&gt;
&amp;lt;mst&amp;gt; it gives you control over message size, packets per second, gets you back latency&lt;br /&gt;
&amp;lt;mst&amp;gt; it is always bidirectional I think&lt;br /&gt;
&amp;lt;mst&amp;gt; and we need to measure CPU ourselves&lt;br /&gt;
&amp;lt;mst&amp;gt; that last seems to be true anyway&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, maybe easy to be understand and analysis than netperf&lt;br /&gt;
&amp;lt;mst&amp;gt; packet loss when it occurs complicates things&lt;br /&gt;
&amp;lt;mst&amp;gt; e.g. with 50% packet loss the real load is anywhere in between&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; that&#039;s the only problem: it&#039;s always bidirectional so tx/rx problems are hard to separate&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, vhost is currently half-duplex&lt;br /&gt;
&amp;lt;mst&amp;gt; I am also not sure it detect reordering&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, it has sequence no.&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; but for ping, as you&#039;ve said it&#039;s ICMP and was not the most of the cases&lt;br /&gt;
&amp;lt;mst&amp;gt; ok, next we have netperf&lt;br /&gt;
&amp;lt;mst&amp;gt; afaik it can do two things&lt;br /&gt;
&amp;lt;mst&amp;gt; it can try sending as many packets as it can&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; or it can send a single one back and forth&lt;br /&gt;
&amp;lt;mst&amp;gt; not a lot of data, but ok&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes&lt;br /&gt;
&amp;lt;mst&amp;gt; and similar with UDP&lt;br /&gt;
&amp;lt;mst&amp;gt; got to go have lunch&lt;br /&gt;
&amp;lt;mst&amp;gt; So I will try and write all this up&lt;br /&gt;
&amp;lt;mst&amp;gt; do you have any hardware for testing?&lt;br /&gt;
&amp;lt;mst&amp;gt; if yes we&#039;ll add it too, I&#039;ll put up a wiki&lt;br /&gt;
&amp;lt;mst&amp;gt; back in half an hour&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; yes, write all things up would help&lt;br /&gt;
&amp;lt;jasonwang&amp;gt; go home now, please send me mail&lt;br /&gt;
* jasonwang has quit (Quit: Leaving)&lt;br /&gt;
 &lt;br /&gt;
* Loaded log from Wed Dec 15 15:07:24 2010&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
	<entry>
		<id>https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=3272</id>
		<title>NetworkingTodo</title>
		<link rel="alternate" type="text/html" href="https://linux-kvm.org/index.php?title=NetworkingTodo&amp;diff=3272"/>
		<updated>2010-09-21T16:55:10Z</updated>

		<summary type="html">&lt;p&gt;Mst: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page should cover all networking related activity in KVM,&lt;br /&gt;
currently most info is related to virtio-net.&lt;br /&gt;
&lt;br /&gt;
Stabilization is highest priority currently.&lt;br /&gt;
DOA test matrix (all combinations should work):&lt;br /&gt;
        vhost: test both on and off, obviously&lt;br /&gt;
        test: hotplug/unplug, vlan/mac filtering, netperf,&lt;br /&gt;
             file copy both ways: scp, NFS, NTFS&lt;br /&gt;
        guests: linux: release and debug kernels, windows&lt;br /&gt;
        conditions: plain run, run while under migration,&lt;br /&gt;
                vhost on/off migration&lt;br /&gt;
        networking setup: simple, qos with cgroups&lt;br /&gt;
        host configuration: host-guest, external-guest&lt;br /&gt;
&lt;br /&gt;
=== vhost-net driver projects ===&lt;br /&gt;
* iovec length limitations&lt;br /&gt;
       Developer: Jason Wang &amp;lt;jasowang@redhat.com&amp;gt;&lt;br /&gt;
       Testing: guest to host file transfer on windows.&lt;br /&gt;
&lt;br /&gt;
* mergeable buffers: fix host-&amp;gt;guest BW regression&lt;br /&gt;
       Testing: netperf host to guest default flags&lt;br /&gt;
&lt;br /&gt;
* scalability tuning: threading for guest to guest&lt;br /&gt;
       Developer: MST&lt;br /&gt;
      Testing: netperf guest to guest&lt;br /&gt;
&lt;br /&gt;
=== qemu projects ===&lt;br /&gt;
* fix hotplug issues&lt;br /&gt;
      Developer: MST&lt;br /&gt;
      https://bugzilla.redhat.com/show_bug.cgi?id=623735&lt;br /&gt;
&lt;br /&gt;
* migration with multiple macs/vlans&lt;br /&gt;
        qemu only sends ping with the first mac/no vlan:&lt;br /&gt;
        need to send it for all macs/vlan&lt;br /&gt;
&lt;br /&gt;
* bugfix: crash with illegal fd= value on command line&lt;br /&gt;
       Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=581750&lt;br /&gt;
&lt;br /&gt;
=== virtio projects ===&lt;br /&gt;
* suspend/resume support&lt;br /&gt;
&lt;br /&gt;
* API extension: improve small packet/large buffer performance:&lt;br /&gt;
  support &amp;quot;reposting&amp;quot; buffers for mergeable buffers,&lt;br /&gt;
  support pool for indirect buffers&lt;br /&gt;
* ring redesign:&lt;br /&gt;
      find a way to test raw ring performance &lt;br /&gt;
      fix cacheline bounces &lt;br /&gt;
      reduce interrupts&lt;br /&gt;
      Developer: MST&lt;br /&gt;
      see patchset: virtio: put last seen used index into ring itself&lt;br /&gt;
&lt;br /&gt;
=== projects involing other kernel components and/or networking stack ===&lt;br /&gt;
* guest programmable mac/vlan filtering with macvtap&lt;br /&gt;
&lt;br /&gt;
* bridge without promisc mode in NIC&lt;br /&gt;
  given hardware support, teach bridge&lt;br /&gt;
  to program mac/vlan filtering in NIC&lt;br /&gt;
&lt;br /&gt;
* rx mac filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        we have a small table of addresses, need to make it larger&lt;br /&gt;
        if we only need filtering for unicast (multicast is handled by IMP filtering)&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in tun&lt;br /&gt;
        the need for this is still not understood as we have filtering in bridge&lt;br /&gt;
        for small # if vlans we can use BPF&lt;br /&gt;
&lt;br /&gt;
* vlan filtering in bridge&lt;br /&gt;
        IGMP snooping in bridge should take vlans into account&lt;br /&gt;
&lt;br /&gt;
* zero copy tx/rx for macvtap&lt;br /&gt;
       Developers: tx zero copy Shirley Ma; rx zero copy Xin Xiaohui&lt;br /&gt;
&lt;br /&gt;
* multiqueue (involves all of vhost, qemu, virtio, networking stack)&lt;br /&gt;
       Developer: Krishna Jumar&lt;br /&gt;
       Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=632751&lt;br /&gt;
&lt;br /&gt;
* kvm MSI interrupt injection fast path&lt;br /&gt;
       Developer: MST&lt;br /&gt;
&lt;br /&gt;
* kvm eventfd support for injecting level interrupts&lt;br /&gt;
&lt;br /&gt;
* DMA emgine (IOAT) use in tun&lt;br /&gt;
&lt;br /&gt;
* allow handling short packets from softirq context&lt;br /&gt;
  Testing: netperf TCP STREAM guest to host&lt;br /&gt;
           netperf TCP RR&lt;br /&gt;
&lt;br /&gt;
* irq affinity:&lt;br /&gt;
     networking goes much faster with irq pinning:&lt;br /&gt;
     both with and without numa.&lt;br /&gt;
     what can be done to make the non-pinned setup go faster?&lt;br /&gt;
&lt;br /&gt;
=== testing projects ===&lt;br /&gt;
* Cover test matrix with autotest&lt;br /&gt;
* Test with windows drivers, pass WHQL&lt;br /&gt;
&lt;br /&gt;
=== non-virtio-net devices ===&lt;br /&gt;
* e1000: stabilize&lt;br /&gt;
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=602205&lt;br /&gt;
&lt;br /&gt;
=== bugzilla entries for bugs fixed ===&lt;br /&gt;
* verify these are ok upstream&lt;br /&gt;
     https://bugzilla.redhat.com/show_bug.cgi?id=623552&lt;br /&gt;
     https://bugzilla.redhat.com/show_bug.cgi?id=632747&lt;br /&gt;
     https://bugzilla.redhat.com/show_bug.cgi?id=632745&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== abandoned projects: ===&lt;br /&gt;
* Add GSO/checksum offload support to AF_PACKET(raw) sockets.&lt;br /&gt;
      status: incomplete&lt;br /&gt;
* guest kernel 2.6.31 seems to work well. Under certain workloads,&lt;br /&gt;
      virtio performance has regressed with guest kernels 2.6.32 and up&lt;br /&gt;
      (but still better than userspace). A patch has been posted:&lt;br /&gt;
      http://www.spinics.net/lists/netdev/msg115292.html&lt;br /&gt;
      status: might be fixed, need to test&lt;/div&gt;</summary>
		<author><name>Mst</name></author>
	</entry>
</feed>