V4V

V4V Overview

Overview:

The V4V technology is a new approach to inter-domain communications on a Xen virtualization platform. Most existing inter-domain communications frameworks use shared memory between domains and some form of event channel interrupt mechanism. V4V is a departure from this approach, instead relying on the hypervisor to broker all communications. Domains manage their own data rings and there is no sharing of actual memory. Domains communicate with one or more other domains (or themselves in the trivial case) using source and destination addresses and ports. This forms a standard 4-tuple that can unambiguously specify where a given block of data came from and is destined for. Note that V4V defines several protocols (making a 5-tuple) but the protocol is not use in the core V4V framework but rather may be implemented at a higher layer in the communications stack. It should be noted that V4V provides TCP/IP like network semantics with the core components described in this document being roughly analogous the Network Layer 3 in the OSI model. The V4V core provides reliable delivery of "network packets" using the 4-tuple described above. The term "messages" will be used to indicate discrete blocks of data within V4V rings as opposed to "packet".

The term "domain" will be used to indicate guest VMs or domains on a Xen platform. Note that Domain 0 can also use V4V in the same fashion as other de-privileged guest domains. Since there is nothing inherently special in the way Domain 0 would use V4V, no differentiation will be made.

Details:

Addressing:

As noted above, v4v uses a 4-tuple address scheme where each end of the communication channel is defined by an address structure as follows.

struct v4v_addr
{
    uint32_t port;
    domid_t domain;
};

Domain IDs are unique on any given platform and serve as the end point address. The port value is analogous to a TCP/IP port that specifies some service at a particular address.

Rings:

The basic construct in V4V is the v4v_ring. A domain that wants to communicate with other domains must register a v4v_ring with the V4V management code in the hypervisor. Rings are identified but a v4v_ring_id which is defined as follows:

struct v4v_ring_id
{
    struct v4v_addr addr;
    domid_t partner;
};

The ring ID defined the local address values and a partner domain. If a partner domain is specified then only communications between the two domains is possible. An ANY value for partner allows a given ring to accept traffic from any other domain. The following defines the ring itself. The domain portion of the id field is always set to the local domain ID.

struct v4v_ring
{
    uint64_t magic;
    struct v4v_ring_id id;
    uint32_t len;
    V4V_VOLATILE uint32_t rx_ptr;
    V4V_VOLATILE uint32_t tx_ptr;
    uint64_t reserved[4];
    V4V_VOLATILE uint8_t ring[0];
};

The length of the ring is specified and the actual ring data buffer starts at ring[0] in the structure. The rx_ptr is the receive pointer into the ring where the next message to be read by the domain is located. This pointer is only ever modified by the domain that owns the ring as it consumes messages in the ring. The tx_ptr is the transmit pointer into the ring indicating where the next received message can be written into the ring. It also represents the end of the message data to be read by the ring owning domain. This pointer is only every modified by the hypervisor as it writes new messages into the domain's ring.

For clarity it should be stated that a ring's data area starting a ring[0] only contains received messages passed to it by the V4V management code in the hypervisor. V4V rings are not shared memory rings with messages going in both directions moving through them.

Register and Unregister Rings:

A key aspect of V4V is that each domain creates its own ring memory and registers it with the V4V management code. In most cases this involves creating a block of system memory then presenting V4V with the physical addresses of the pages backing the newly allocated buffer. The following structure is used to pass that information to V4V.

struct v4v_pfn_list
{
    uint64_t magic;
    uint32_t npage;
    uint32_t pad;
    uint64_t reserved[3];
    v4v_pfn_t pages[0];
};

This describes the number of pages in the ring and the Page Frame Number of each page.

A ring is registered using the V4VOP_register_ring hypercall passing in the new v4v_ring descriptor and the v4v_pfn_list descriptor. On success, the ring is active and may start sending immediately or be notified of received traffic. Diagram 1 shows the creatuion of a V4V ring.

Unregistering a ring is simply using another hypercall V4VOP_unregister_ring. The domain completely owns the ring and can unregister it at any point in time.

VIRQ:

VIRQ or virtual interrupts are interrupts delivered on the Xen platform device's IRQ; the interrupts are sourced from within the hypervisor. This is a generic Xen mechanism that V4V uses and will not be described in further detail here. V4V uses a dedicated VIRQ number to indicate a change in state of interest to a domain concerning V4V. Such a domain must first register for these notifications using the appropriate hypercalls.

The reception of a VIRQ_V4V events indicates 2 possible changes of V4V state a domain would be interested in:

One or more rings that a domain owns has received messages.
One or more destination rings that a domain attempted to send messages to but could not, now has sufficient space to receive.

A VIRQ_V4V event could mean either or both of the above has occured.

Ring Receive:

The domain that owns a ring is free to read data from its ring at any point. The terminating condition indicating no more messages to be read is when rx_ptr == tx_ptr. Note that the ring is not actually circular so a domain must handle when the wring wraps around (i.e. when tx_ptr)

struct v4v_ring_message_header
{
    uint32_t len;
    struct v4v_addr source;
    uint16_t pad;
    uint32_t protocol;
    uint8_t data[0];
};

As stated, a ring can be read at any time (e.g. using a polling algorithm) but V4V also provides an interrupt mechanism to indicate message arrival. In addition (and perhaps more useful), the domain owning a ring can receive virtual interrupts (VIRQ_V4V) to indicate the arrival of messages (see above).

Ring Notify:

V4V provides a facility to notify the management code in the hypervisor that reveive processing has been done and/or that there are pending sends. The V4VOP_notify hyerpcall should be made when either or both of these conditions exist.

To notify of receive activity, no additional information is supplied to the notify hypercall (the change is implicit in that the rx_ptr changed).

When a domain is ready to send messages to 1 or more destination rings, the notify hypercall is used to query the state of the destination rings to determine if they can receive the data. The following structures are used to specify what the notifying domain is interested in.

struct v4v_ring_data_ent
{
    struct v4v_addr ring;
    uint16_t flags;
    uint32_t space_required;
    uint32_t max_message_size;
};

struct v4v_ring_data
{
    uint64_t magic;
    uint32_t nent;
    uint32_t pad;
    uint64_t reserved[4];
    struct v4v_ring_data_ent data[0];
};

The caller supplies the above structures include N v4v_ring_data_ent structures after the main descriptor. Within the v4v_ring_data_ent structures the ring and space_required information for the destination ring to query is filled in. V4V fills in the flags and max_message_size in the v4v_ring_data_ent structures as output.

The max_message_size indicates how much message data can be sent at the current time. If max_message_size

The flags can indicate:

V4V_RING_DATA_F_EMPTY - The ring is empty
V4V_RING_DATA_F_EXISTS - The ring exists
V4V_RING_DATA_F_PENDING - Pending interrupt exists - do not rely on this field - for profiling only
V4V_RING_DATA_F_SUFFICIENT - Sufficient space to queue space_required bytes exists

Sending:

There are two hypercalls for sending messages, V4VOP_send and V4VOP_sendv. They both take a source and destination v4v_addr and a protocol value. The send op takes a buffer and length where the sendv op takes a list of buffers and a count of items in the list. If the message(s) cannot be sent, a return code indicating the caller should try again will be returnd and V4V will internally request that a VIRQ_V4V notification be raised when enough space becomes available.

V4V IPTables:

Built into the V4V management code is an IPTables like firewall. Three hypercalls allow rules to be added, deleted and listed. The implementation is much the same as Linux IPTables (public information could be referenced here).

Motivation:

The motivation for V4V is to invent a new approach to inter-domain communications on a Xen platform that is simpler, more secure and less prone to failure. The existing approaches fall short on many of these criteria.

Security:

V4V provides a much higher level of isolation between domains because no memory is shared. Each domain completely owns their rings. Only the domain that owns the ring and the hypervisor can access those rings. The hypervisor (as a trusted component of the system) brokers all communications and ensures the integrity of the rings.

Fault Tolerance:

V4V is more fault tolerant than existing approaches. Since the hypervisor brokers all activity it has complete control over V4V. An individual domain can manage the lifetime of its rings without any ill effect on other domains. A domain that corrupts or misuses its own ring, damage (or even see) rings owned by other domains. Domain shutdown or crashes with open rings is trivially handled in the V4V management code.

Simplicity:

The interface and semantics for using V4V are quite simle. Its likeness to TCP/IP means it is easily fit into existing protocol frameworks within operating systems. The internal workings of V4V are also far simpler.

Performance:
Though the reliance on data copying from one domain to another may seem a major performance issue, it actually turns out not to be. Data copies on modern systems are extremely fast due to memory/bus speeds and advanced instructions allowing larger copies per CPU cycle. In addition, due to locality of reference with respect to CPU caches when using V4V, most copies will occur in cache drastically speeding up the data copies. Finally V4V does not introduce any extra overhead due to VMEXITs than existing solutions.

V4V Design Improvements

Connection

At the moment V4V, doesn't keep track of an established connection, which have an impact on the firewall ability to track connections, and make every rings of a guest potentially able to receive message from any other guest.

One possible solution is to provide the ability to have private rings that are only for receiving data from a specific guest.

connect hypercall

Scability improvements

The notification mechanism currently present doesn't allow any communication between the hypervisor and the guest. The mechanism is just notifying the guest that something happened, and this is up to the guest to find out which elements changed.

One way to improve here, would be do offer a list of events along with the notification. Instead of reinventing a brand new mechanism, we could have a v4v ring that is only use by the hypervisor to write events. Those events would allow the guest to identify which rings need processing, and which destinations have space too. It could also have other events: connection request, etc.