Argo : Hypervisor-Mediated data eXchange : Development

Items to address in Argo development – several items listed are described in more detail further down this page.

LONG-RUNNING-DOMAIN-SHUTDOWN-WORK)
When a domain is shut down, the hypervisor runs a function to remove all the hypervisor state that is associated with the domain. It needs further development because it can be a long-running function and it should not block the hypervisor from making progress on other work while it is performed. Investigate batching the work with periodic yields as needed.

SENDER-DOMAIN-CONTEXT)  : OXT-1503
Hypervisor to provide context about the domain sending a message to the message recipient. Context will include the sender's XSM sid when operating on XSM-enabled Xen systems.

- Value: guest software can trust and reason about security context of message source
- Value: supports implementation of strong access control

NAME-SERVICE)
- design input: a simple reference implementation exists in the uXen v4v driver
- Value: supports use/port of the v4v Linux device driver to Argo
- Value: supports communication between endpoints at different layers of nested virtualization
- Value: supports reconnection of device drivers across domain reboot
- Value: towards enabling PV drivers and other connections without XenStore -> Value: encourages adoption of Argo (Xen Community members have expressed interest)

LINUX-DRIVER) : OXT-1473
A modern Linux device driver implementation suitable for submission to and inclusion in the upstream Linux kernel tree.

There is a separate wiki page for Linux Argo Driver development.

- Value: Higher-quality software implementation for OpenXT; easier to review for security properties and maintain.
- Value: Supports adoption of Argo beyond OpenXT. eg. can satisfy external code review.

REMOTE-CONTEXT)
Communication of sender process context between domains.
In the guest-to-guest protocol within the Linux Argo device driver, convey a process identifier and SELinux security context of the source process to the receiver domain to enable access control checks to be performed there. To be implemented in LINUX-DRIVER.
Proposed by Stephen Smalley at the OpenXT Summit 2016:
https://github.com/OpenXT/docs/raw/master/presentations/2016-06-07-openxt-summit/14%20-%20Smalley%20-%20Access%20Control.pdf

CONNECTION-STATE)
Enforce and track pairing of the rings used for bidirectional communication endpoints.
Bidirectional exchange of messages between domains is essential to most use cases for Argo. In the common scenario, a service is registered on a well-known port, while clients connect from dynamically-allocated (ie. temporally assigned) ports. Tracking endpoint connection state within the hypervisor is necessary to enable access control rules to apply narrowly to bidirectional connections instead of relying on broad, non-specific inverse rules as seen with the viptables.

- Value: A pre-requisite for strong Access Control (Argo firewall).

ACCESS-CONTROL)
Run-time configurable mandatory access control over messages. ie. A replacement for the OpenXT viptables/v4vtables "firewall".
Acts upon SENDER-DOMAIN-CONTEXT and REMOTE-CONTEXT data and depends upon CONNECTION-STATE.

- Implementation of a new, granular firewall with Connection State awareness.
- Hypervisor, kernel and userspace components to this.
- Value: Enables integration of firewall into upstream Xen.
- Value: Hypervisor-enforced granular control over accessibility of communication channels. A strong differentiator of Argo vs. other interdomain communication systems. Supports effort for Argo to attain security support in Xen, and for Argo being enabled by default on Xen systems.

WILDCARD-PROGRESS)
Wildcard (any-sender) rings: ensuring that all senders are able to make forward-progress with message transmission to the receiver.
ie. Preventing DoS of a sender's access to the ring by any another sender.

Senders register resources for option to buffer on send and potentially throttle.
Firewall likely to be helpful in practice by enabling constraints on classes of authorized sender domains.
See Hyper-V's primitives for inter-VM communication: use of transmit slots.
- Value: Improves resiliency of Argo services (eg. in multi-tenant systems), which supports effort for Argo to attain security support in Xen, and for Argo being enabled by default on Xen systems.

SHUTTER-RINGS)
Improving confidentiality of transmitted data to host memory read attacks.
Option to replace the permanently-resident ring mappings with transient mappings created and destroyed as messages are sent on specific rings.
Policy controls for use to be determined.

- Value: improves confidentiality of data and resilience to speculative execution data read attacks.

WILDCARD-SPACE-WAKEUPS)
Investigate improvement to the efficiency of ring notifications for space availabilty in wildcard rings: avoid the stampeding herd.

- Value: potential for improvement in scalability, performance or performance isolation.

NOTIFY-RING)
Investigate improvement to the efficiency of notifications for message delivery: option to provide receiver with more than a single interrupt bit when a message is delivered: write the destination ring id into a
dedicated notification ring registered by the receiver.

- Value: Performance improvement in the guest device driver; improved scalability, especially for server VMs with many rings.
- Challenge: Increases complexity of implementation in the hypervisor. Could potentially be addressed via making it an independent KCONFIG option.

CROSS-NESTING-COMMS)
Investigate changes to the Argo guest interface necessary to facilitate efficient communication between guests at different levels of nested virtualization. Determine what cooperation, if any, is necessary or beneficial between the nested hypervisors.

- eg. Should enable communication between uXen and Xen guests on the same system.
- Will need a smart way to enable and control/restrict this.
- Value: granular Mandatory Access Control enforcement over multi-level communication paths.

ARGO-FOR-UXEN)

- Value: same interfaces on both hypervisors increases scale of deployment options for software and policies developed against the interfaces, so development of workloads is more attractive.
- Value: enables close integration of uXen and microVMs on a Xen-based system (eg. OpenXT)
- Value: potential for guest compatibility between uXen and Xen systems.
- Value: Argo is more widely deployable via uXen systems.
- Value: benefits of Argo made available to uXen systems.

Further items:

* OXT-1677: Documentation of Argo
- hypervisor developer guide
- guest software guide
- userspace admin guide
- Value: enables faster and wider adoption of Argo.

* OXT-1683: Test coverage of Argo
- Value: protects Argo against regressions as software develops.
- Value: enables reproduction of behaviour for investigation, inspection.

* OXT-1691: Use Linux sock_type values for communicating protocol between endpoints
- ie. datagram vs. stream
- Value: a more standard implementation, and requested by the OpenXT community.

* OXT-1660: Remove the compat ioctl from the Linux Argo device driver
- this is an unneeded remnant originally from v4v
- Value: increased confidence in implementation; simplifies sanity checking user-supplied values.

* Implement network and block PV drivers for Xen using Argo
- Value: enables dropping XenStore and the grant tables from the hypervisor and in-guest software
- Value: encourages wide deployment and so sustained testing of Argo
-> Value: wider adoption enables more resources for eg. performance profiling and tuning
- Value: may enable PV drivers in dom0less configurations

* Self-protection via access control
- investigate option for guests to request firewalling of themselves
- Value: could enable simple self-hardening

* Research: Argo support for time-sensitive message transport (eg. time sensitive networking)
- Scheduler intergration likely essential
- Value: Time Sensitive Networking is increasing in importance -- eg. may soon be required by clients in order to access secure cloud systems -- yet hard to support on virtualized systems.

* Research: Integration with memory encryption technologies
- Value: May be necessary for compatibility when in use by guests / platforms.

* Research: Accelerated transport options leveraging available hardware
- Value: Potential for performance and scalability improvement; could broaden Argo's use cases.

* Research: Asynchronous message send primitive
- Value: Potential for performance and scalability improvement; could broaden Argo's use cases.

* Research: Can Argo assist with support for Hyper-V enlightenments?
- Value: Attractive for XenServer, widens Argo's use cases, improves guest experience.


HYPERVISOR-AGNOSTIC HYPERVISOR INTERFACE)

See description on the VirtIO-Argo page.

INTERRUPT DELIVERY WITHOUT EVENT CHANNELS)

See description on the VirtIO-Argo page.



Additional detail on some items follows



SENDER-DOMAIN-CONTEXT

The hypervisor will provide context data about the domain that sent an Argo message to the receiving domain.

The message context currently provided with each Argo message is:
* The sender's domain ID, provided by the hypervisor.
* For partner rings: the hypervisor will ensure that the sending domain is the same domain as when the receiver registered the ring.
(ie. a more recent domain with the same domain ID cannot spoof the original.)
* The sender's Argo port, provided by the sender.
* A "message type" value, provided by the sender.

The message header will be revised to add additional data:
* On XSM-enabled systems: the 32-bit XSM sid of the sender domain.

This change will require reducing the size of the currently-32-bit message_type field.



EXTENDED-SENDER-DOMAIN-CONTEXT

This extended data is for consideration - no firm decision made on this:

Adds additional data to the sender domain context:

* An 8-bit set of binary sender context flags, to include bits for:
+ Sender domain is privileged.
- this is usually dom0, but may differ on disaggregated systems
+ Sender domain is a device model stubdomain of the receipient domain.
- specifically: src_d has priv over dst_d
+ Receipient is a device model stubdomain of the sender domain.
- specifically: dst_d has priv over src_d
+ The remaining bits are reserved.



CONNECTION-STATE

Enforce and track the pair relationship between communicating endpoints, to provide foundation for support of fine-grained access control.

To consider: connection state tracking could be made dependent on some indicator the protocol 

Changes to ring registration:

  • The domain registering a ring must declare the ring, and hence the endpoint, to be one of these types:
    • Client ring: for receiving messages from an entity that has registered a specified remote ring: <dst_d, dst_aport>
      which must be indicated when the new client ring is registered and must already exist.

      • The declared destination state is recorded in the hypervisor's internal ring state.
      • Client rings cannot be registered as wildcard rings.
    • Server ring: for receiving messages from entities that may register client rings later.

=> to investigate: impact of wildcard rings and more-specific matching rings that are registered later. May need to block those being registered if a wildcard ring has already been registered.

Changes to the message send operation:

  • Enforce that one end of the transmission is a Client ring and the other a Server ring.
  • Add verification that the claimed source ring for a message exists before allowing the message to be sent.
    • A domain's own ring must therefore be registered before sendv will accept it as a source address of a message.
  • If the ring for the sending domain's message source address is a client ring, require that the message's destination matches the client ring's destination specified when the client ring was registered.
    • Refuse the message transmission if not.
  • If the ring for the receiving domain's message destination address is a client ring, require that the message's source matches the client ring's destination specified when the client ring was registered.
    • Refuse the message transmission if not.


Note that the above changes are significant and require changes to the access patterns to hypervisor-internal ring state and consequently there is likely to be impact on the fine-grained locking discipline within the current implementation. Maintaining performance isolation between domains to prevent DoS potential will require attention.



ACCESS-CONTROL

Argo implements XSM controls over the hypercall operations. These are static, defined in the host XSM policy, and enable expression of coarse granularity rules over (src domain -> dst domain) connectivity. An additional mechanism that enforces finer-grained and dynamic access control rules over the connectivity between guests Argo messages is required.

  • Finer-grained: able to validate and act upon additional fields beyond just the domain's XSM sid.
  • Dynamic: rules are added and removed at runtime, typically by the toolstack, as domains transition through their lifecycles.

Support for both static and dynamic rules is desirable:

  • Static Rules: A set of fixed firewall rules that are always enforced.
  • Dynamic Rules: These can be added and removed at runtime and are able to both narrow and widen the set of allowed communication paths, within constraints set by the Static Rules.

Static Rules should be provided at host boot, measured, and enforced at all times throughout the host lifecycle. The DomB proposal may be appropriate for an implementation of rule programming during system launch, such that the control domain can be confined. A mechanism is required for ensuring that the programmed Static Rules are irrevocable until host shutdown.

To ensure that Static Rules cannot be circumvented, the programmed rules must support expression for whether they may be narrowed or widened by later rules. The intention is that Static Rules will be non-circumventable (eg. by the control domain) aside from performing modification to the materials that are measured during boot.


* Self-protection

Domains should be able to supply their own rules to narrow communication allowed to them and so enable self-protection. These rules should support the option for being irrevocable by the domain itself for the lifetime of the domain.

* Policy controls over dynamic rule operations

The access control system has the following basic operations:

  • Add rule
    - Introduces a new access control rule at the specified position.
  • Delete rule
    - Removes a specified access control rule.
  • List the active rules
    - Retrieves the list of current access control rules.

XSM policy needs to be able to express which domains are authorized to view or perform modifications to the access control rules, with context about which domains are affected by the specific rule being modified or listed.

To consider/investigate:

+ What granularity is correct for the policy controls over the rules?
eg. viptables/v4vtables has the basic constraint: only the privileged domain can perform the rule operations. This prevents domains from being able to self-protect.

+ Is it important to be able to enable (ie. in a specific configuration) that a domain is able to list all the access control rules that are applied to it, but is not able to list those applied to other domains?

* Filter terms

Support needs to be implemented for expressing rules that inspect and match on:
+ source domain id
+ destination domain id
+ source Argo port
+ destination Argo port
+ source domain XSM sid
+ source domain is control domain
+ source domain is stub domain of the other communication endpoint
+ destination domain is stub domain of the other communication endpoint
+ message type
+ bi-directional : rule applies to messages (typically replies) sent in the opposite direction too

Implementation of CONNECTION-STATE is required to enable rules for allowing bi-directional connections (the common case) and avoid requiring generalized, very permissive, "inverse rules".

To investigate: support for rules that inspect and match on guest header fields:
+ process identifier
+ process SELinux security context

* Interface of the Argo Notify op

The notify op enables a domain to query the state of another domain's ring, and request notification when space becomes available.

To investigate: the query operation should specify which source port the query is on behalf of, and return an error if communication from that source to the destination being queried is disallowed. This is to assist correct behaviour within the querying domain.