Page Comparison

...

One possible solution is to provide the ability to have private rings that are only for receiving data from a specific guest.

connect hypercall

...

Scaling Improvements

The notification mechanism currently present doesn't allow any communication between the hypervisor and the guest. The mechanism is just notifying the guest that something happened, and this is up to the guest to find out which elements changed.

One way to improve here, would be do offer a list of events along with the notification. Instead of reinventing a brand new mechanism, we could have a v4v ring that is only use by the hypervisor to write events. Those events would allow the guest to identify which rings need processing, and which destinations have space too. It could also have other events: connection request, etc.

V4V Access Control

Current Approach

The current approach to securing V4V is through the use of a firewall-like interface known as viptables. This mechanism is described briefly in the overview section of the V4V wiki pageThis is a simple filtering mechanism which allows dom0 to specify ACCEPT / REJECT policy for packets being sent from one end point to another. In this case an 'endpoint' is a (domid, port) 2-tuple / ordered pair.

The rules could be represented as follows:

ACCEPT: (X, Y) -> (X', Y')

Such a rule would allow the source domain X to send data over V4V with a source port Y to destination domain X' with destination port Y'. Similarly this rule could specify the REJECT action be taken for matching communications over V4V in which case the data would be rejected and the sender notified through an error value returned from the hypercall.

Issues

The approach v4vtables takes to securing communications over V4V between VMs is definitely "the right way to do it". There are however a few issues with the approach. This section will deal with several issues raised in xen-devel discussions around [V4v_Patchset_10]. We'll also address some concerns raised internally with regard to XSM.

Patchset 10

General issues w/r to v4vtables raised in Patchset 10 came from Tim D. These had less to do with security than they did with general implementation details but since v4vtables is a security mechanism these are discussed here. Specifically see issues like #126 and any others with the phrase "v4vtables misery".

  [Issue 126] More v4vtables mess

Tim also asked for a more explicit description of the calling convention for v4vtables rules. From this it's reasonable to conclude that v4vtables needs more love before it'll be ready for upstream.

Denial of Service

There is the possibility of a denial of service on the hypervisor caused by a guest. The scenario here would be that a guest creates a very large number of V4V rings/sendv vectors/notify requests and exhausts hypervisor resources. This seems to be a "very bad thing" so some limitations need to be considered. This is issue 6.

Another DoS situation exists currently. In this scenario a guest sending unwanted or unexpected data to another guest could saturate it's V4V rings with garbage. This would effectively deny service to the guest owning the V4V ring. This scenario could be addressed by the v4vtables if they were modified to allow guests to add rules at any point limiting senders to their rings. This is part of issue 7.

It's been suggested on the list that we need a mechanism to disable V4V in situations where it's not being used. This was brought up and tagged by Ross as [issue].

XSM

Tim D. briefly mentioned XSM w/r to adding v4vtables rules in issue 132 but it's become clear that the issue is more fundamental. v4vtables supplies functionality that overlaps with what XSM is designed to do. Adding v4vtables to Xen effectively adds objects to the hypervisor that belong to a specific domain (message rings). Access to these objects for communication with the guest to which they belong is effectively an access control decision. We've invented v4vtables as an access control mechanism that governs access to this specific object type.

Xen has however already accepted XSM as a generic access control mechanism intended to solve similar problems. That's not to say that XSM is a perfect fit to replace v4vtables and in fact it can't replace v4vtables completely. Still it's likely a good idea to use XSM where possible and use v4v tables to extend this functionality where necessary. This includes not only considering the use of XSM for access control on V4V message exchange but also on the manipulation of v4vtable objects.

Recommendations

This section documents some recommendations to keep V4V moving forward. This is all open for discussion and none of it is set in stone. Please edit this document with suggestions / objections / ideas.

DOS

V4V enable / disable

The requested flag to disable V4V system-wide is a pretty heavy-handed approach but it's likely a good thing to have. This should be a Xen command line option. It may be best to actually have V4V disabled by default and provide the cmdline option to enable V4V. Semantics like the flask-enforcing flag may be right:

v4v=1 to enable
v4v=0 to disable
disable by default (when no cmdline option is given, opt-in semantics)

Note that this feature also addresses resource consumption concerns; if V4V is not being used there is no point allocating V4V objects and consuming event channels.

per-guest limits

To address the concerns over a DoS from guests creating a large number of V4V resources it's probably sufficient to introduce limits on a per-guest basis. This would be something like adding per-VM config options like the following:

v4v-rings-max=N to allow VM to create N V4V rings
v4v-rings-max=0 to disallow VM from creating rings (note the VM could still send data)
v4v-sendv-max=N to allow VM to send N sendv vectors during a sendv op
v4v-sendv-max=0 to disallow VM from sending data (note the VM could still receive data)
v4v-send-max=N to allow VM to send a maximum to N bytes in a sendv op
v4v-notify-max=N to allow VM to send N rings to check in a notify op
default to 0 when not specified (RJP I am leaning towards default to reasonable limits)?

XSM

To get XSM involved in controlling access to V4V objects we first need to enumerate the objects and the actions that are performed on them. The objects will likely be easy enough to enumerate. From a quick chat yesterday there's obviously the ring itself but there will likely be others including those belonging to the v4vtables stuffs. Some work should be done to fill in the following data:

Objects & Actions

Actions for V4V Ring: Create, Destroy, Send
Actions for v4vtable entries: Create, Delete, Read

Further this data should be linked to the structures (source file and line) where the XSM label will live and where the access control hooks need to be placed. Similarly we'll need to work up a patch to the default XSM policy which adds the necessary object classes and access vectors.

Policy Examples

So for the sake of argument let's assume that we implement the XSM stuff above and we can write XSM policy like the following:

 allow domA_t self:v4v { create delete };
 allow domB_t domA_t:v4v send;

This would allow a VM with the label domB_t to send data to a V4V ring with the label domA_t (presumably belonging to the VM labeled domA_t). This gives us the same semantics as the SELinux extensions to DBus.

With this, we've achieved ~50% of the protections offered by v4vtables: we can restrict which VMs are able to communicate over V4V. What we're lacking is the notion of a 'port'. Unfortunately the ordered pair of (network address, port) doesn't map well to the Flask policy. SELinux has a mechanism for labeling port numbers but the language doesn't allow complex types so the label of the node (an IP address) cannot be associated with the port. I'd suggest we don't try to extend XSM in the same way with the same limitations and use v4vtables instead.

Now that we've extended XSM to govern v4vtables rules it makes sense to expose the v4vtables hypercalls to guests beyond dom0. This makes v4vtables much closer to a real firewall in that the guest is in control of their own policy. Still dom0 will be able to create and delete policy as well and with the XSM rules it's possible for dom0 to add rules that the guest cannot manipulate or even see:

 allow domA_t self:v4v_rule { create delete read };
 allow dom0_t domA_t:v4v_rule { delete read };
 allow dom0_t dom0_t:v4v_rule { create delete read };

This would allow both domA and dom0 to create and delete rules but dom0 would be able to delete rules that belong to domA but domA would not be able to manipulate rules created by dom0. Obviously there would need to be (and likely are already) hard coded policy to prevent a VM from creating v4vtables policy where it's anything other than the destination (ingress only). This also assumes that there are not transition rules for v4v_rule objects as I believe they're all stored in a single list (no labeled parent object to base transition on). Some additional thought on this last point may be useful.

v4vptables

There has been some discussion around exposing v4vtables to the guest. This would allow it to protect itself to some extent. In this case some default constraints need to be placed on rule manipulation. This is for basic sanity and for systems that are unable or unwilling to use XSM:

dom0 or some privileged domain can manipulate all rules
guests can manipulate rules provided they are ingress rules (the guest creating the rule is the destination)

The above default behavior will allow a guest to override ingress rules created by dom0. This may not be desirable as it would allow the guest to open itself up to potential attack. Environments that want this level of protection are free to use XSM as described above (once it's implemented) as a mitigation.

Next steps include a proposal for forward progress on V4V access control with XSM and v4vtables.

Musings

So here is some of the reasoning behind why we think v4v is a good solution for inter-domain communication (and why we think it is better than the current shared memory grant method that is used).

Reasons why the v4v method is quite good even though it does memory copies:

Memory transfer speeds through the FSB in modern chipsets is quite fast. Speeds on the order of 10-12 Gb/s (over say 2 DRAM channels) can be realized.
Transfers on a single clock cycle using SSE(2)(3) instructions allow moving up to 128 bits at a time.
Locality of reference arguments with respect to processor caches imply even more speed-up due to likely cache hits (this may in fact make the most difference in mem copy speed).

Reasons why the v4v method is better than the shared memory grant method:

v4v provides much better domain isolation since one domain's memory is never seen by another and the hypervisor (the most trusted component) brokers all interactions. This also implies that the structure of the ring can be trusted.
Use of v4v obviates the event channel availability issue since it doesn't consume individual channel bits when using VIRQs. This one is obsolete since it was switched to normal event channel use.
The projected overhead of VMEXITs (that was originally cited as a majorly limiting factor) did not manifest itself as an issue. In fact, it can be seen that in the worst case v4v is not causing many more VMEXITs than the shared memory grant method and in general is at parity with the existing method.
The implementation specifics of v4v make its use in both a Windows and a Unix/Linux type OS's very simple and natural (ReadFile/WriteFile and sockets respectively). In addition, v4v uses TCP/IP protocol semantics which are widely understood and does not introduce an entire new protocol set that must be learned.

Some of the downsides to using the shared memory grant method:

This method imposes an implicit ordering on domain destruction. When this ordering is not honored the grantor domain cannot shutdown while the grantee still holds references. In the extreme case where the grantee domain hangs or crashes without releasing it granted pages, both domains can end up hung and unstoppable - the DEADBEEF issue. This issue does not occur with libvchan we discovered.
You can't trust any ring structures, because the entire set of pages that are granted are available to be written by the other guest.
The PV connect/disconnect state-machine is poorly implemented. There's no trivial mechanism to synchronize disconnecting/reconnecting and dom0 must also allow the two domains to see parts of xenstore belonging to the other domain in the process.
Using the grant-ref model and having to map grant pages on each transfer cause updates to V->P memory mappings and thus leads to TLB misses and flushes (TLB flushes are expensive operations).

Upstream Notes

V4V had to be changed quite a bit to be accepted upstream. The API and hypervisor ABI changed which means we will need to build compat layers into the guest drivers. The VIRQ was replaced with a standard masked event channel. Since the number of event channels has been increased in upstream Xen, this is not a big deal. These changes mean that some of the information in this page and on the API link below are incorrect (or will be) with the new implementation. But the details about functionality in the overview and the "thoughts and justifications" are still relevant.

TODO: We had once collected some metrics on V4V vs. libvchan. I think they were posted to xen-devel but I have not been successful in finding them. It would be nice to have these data.

Versions Compared

Old Version 5

New Version 6

Key