VM Memory Layout

Copyright 2016 by Assured Information Security, Inc. Created by Ross Philipson <philipsonr@ainfosec.com>. This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Memory Model

The rundown here will focus on HVMs, though PV guests are not all that different. The following is the memory layout from xenops/memory.ml:

Memory Layout
(* === Domain memory breakdown ============================================== *)

(*           ╤  ╔══════════╗                                     ╤            *)
(*           │  ║ shadow   ║                                     │            *)
(*           │  ╠══════════╣                                     │            *)
(*  overhead │  ║ extra    ║                                     │            *)
(*           │  ║ external ║                                     │            *)
(*           │  ╠══════════╣                          ╤          │            *)
(*           │  ║ extra    ║                          │          │            *)
(*           │  ║ internal ║                          │          │            *)
(*           ╪  ╠══════════╣                ╤         │          │ footprint  *)
(*           │  ║ video    ║                │         │          │            *)
(*           │  ╠══════════╣  ╤    ╤        │ actual  │ xen      │            *)
(*           │  ║          ║  │    │        │ /       │ maximum  │            *)
(*           │  ║          ║  │    │        │ target  │          │            *)
(*           │  ║ guest    ║  │    │ build  │ /       │          │            *)
(*           │  ║          ║  │    │ start  │ total   │          │            *)
(*    static │  ║          ║  │    │        │         │          │            *)
(*   maximum │  ╟──────────╢  │    ╧        ╧         ╧          ╧            *)
(*           │  ║          ║  │                                               *)
(*           │  ║          ║  │                                               *)
(*           │  ║ balloon  ║  │ build                                         *)
(*           │  ║          ║  │ maximum                                       *)
(*           │  ║          ║  │                                               *)
(*           ╧  ╚══════════╝  ╧                                               *)

The blocks marked build maximum and video are passed to xenvm via the input configuration. The balloon area is the memory available to be ballooned up using xenops balloon option when Populate on Demand (PoD) is used. The extra internal block is an extra Mb of total memory given to the guest.

 TODO: It is not totally clear what extra external and shadow are but they are external to the guest's memory. Presumably they are for use by xenvm.

The memory model is defined in xenops/memory.ml. The extra blocks and actual memory model structs are found here:

module HVM_memory_model_data : MEMORY_MODEL_DATA = struct
        let extra_internal_mib = 1L (* The extra 1Mb *)
        let extra_external_mib = 1L
end
 
...
 
module Memory_model (D : MEMORY_MODEL_DATA) = struct
       (* memory block definitions generated here *)
       ...
end

The VM building code mainly resides in xenops/domain_control.ml. Starting in build_hvm which is called out of xenvm/vmact.ml, all the memory values are fetched:

  • static_max_mib comes from static_max_kib passed to the function. It is the build_max_mib + video_mib.
  • xen_max_mib is calculated in the memory model as static_max_mibD.extra_internal_mib
  • build_max_mib and build_start_mib end up being the same for HVMs in OpenXT because static_max_mib and target_mib are the same. They are calculated in the memory model by subtracting video_mib.
  • build_max_mib and build_start_mib may be different for guests when PoD mode is desired.

With the values above, the first stop is set the overall maximum memory for the guest. This is done in build_pre using xen_max_mib. After that step build_hvm then calls Xg.hvm_build (which eventually calls libxc:xc_hvm_build). The build_start_mib and build_max_mib values are passed and are used to physmap the initial memory for the guest. Note this does not include the video_mid or D.extra_internal_mib. More on those later. This is the relevant code, annotated:

let build_pre ~xc ~xs ~vcpus ~xen_max_mib ~shadow_mib ~required_host_free_mib domid =
        ...
        (* This is the call to set the overall maximum for the guest. Note that none of this memory
           is mapped yet. This is the the maximum possible memory *)
        Xc.domain_setmaxmem xc domid (Memory.kib_of_mib xen_max_mib);
        ...
 
let build_hvm ~xc ~xs ~static_max_kib ~target_kib ~video_mib ~shadow_multiplier ~vcpus
              ~kernel ~timeoffset ~xci_cpuid_signature domid =
        ...
        (* Call build_pre first to set overall memory xen_max_mib *)
        let store_port, console_port = build_pre ~xc ~xs
                ~xen_max_mib ~shadow_mib ~required_host_free_mib ~vcpus domid in
 
        ...
        (* Call Xg xenguest helper library to do the actual domain building via libc. The
           build_start_mib and build_max_mib values are used to do the physmapping at this tine. *)
        let store_mfn, console_mfn = Xg.hvm_build xgh domid (Int64.to_int build_max_mib) (Int64.to_int build_start_mib) kernel platformflags store_port console_port in
 

Build start and max memory

A note on the memory values used in Xg.hvm_build. In the actual call to libxc, the values are mapped as such:

CAMLprim value stub_xc_hvm_build_native(value xc_handle, value domid,
    value mem_max_mib, value mem_start_mib, value image_name, value platformflags, value store_evtchn, value console_evtchn)
 
        ...
        args.mem_size = (uint64_t) Int_val(mem_max_mib) << 20;
        args.mem_target = (uint64_t) Int_val(mem_start_mib) << 20;
 
        ...
        /* In the libxc code, if mem_target < mem_size, Populate on Demand mode is set for the VM and
         * during the physmap process, less that mem_size will get mapped initially. */
        r = xc_hvm_build(xch, _D(domid), &args);

Video and extra Internal memory

As noted above, the overall maximum amount of memory a guest is allowed includes these values but they are not physmapped in by the domain builder code. They are in fact physmapped by QEMU. In OpenXT the video memory is 16Mb. That is mapped in xen-all.c:xen_ram_alloc. Any additional devices like xenmou can also physmap more memory in the extra internal region with as noted is 1M.

xenvm vs. libxl

The good news is that all of the above machinery is more or less the same for libxl:

/* Defined in libxl_internal.h, this is the extra internal memory in Kb */
#define LIBXL_MAXMEM_CONSTANT 1024
 
int libxl__build_pre(libxl__gc *gc, uint32_t domid,
              libxl_domain_config *d_config, libxl__domain_build_state *state)
    ...
    /* Target memory like max memory in the info struct includes the video memory. That has
     * the extra internal memory added during the call to set the overall max memory for the
     * guest. Note the values are in Kb in this case. */
    xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb + LIBXL_MAXMEM_CONSTANT);
    ...
 
int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
              libxl_domain_build_info *info,
              libxl__domain_build_state *state)
    ...
    /* As before, the video memory is removed from the mem_size and mem_target before
     * being sent to the libxc domain build to get physmapped. The shift 10 is due to some
     * weirdness in the values in libxl, see the comment in the code. */
    args.mem_size = (uint64_t)(info->max_memkb - info->video_memkb) << 10;
    args.mem_target = (uint64_t)(info->target_memkb - info->video_memkb) << 10;
    ...
    ret = xc_hvm_build(ctx->xch, domid, &args);

TODO: xenvm uses the max memory value when doing the xc_domain_setmaxmem call where libxl uses the target memory which can be smaller when using PoD. Needs more investigation but most likely handled during the ballooning process. That probably means xenvm is really doing it wrong.