Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

I have investigated a series of issues which appear to be related to a poorly performing dbd.  Systems under heavy load (such as synchronizing a large number of VMs, loading the UI with many VMs on a system, etc.) tend to become unstable and unresponsive.

Looking into the current dbd implementation, AFAICT, there are a few limitations that are likely to be the cause:-

  • dbd is single threaded.

...

  • whenever there is a write to any part of the system's db tree, all databases are flushed to disk, as opposed to just the owning/relevant dirty db.

...

  • these writes are synchronous, to ensure db consistency.

...

  • because dbd is single threaded, the dbus servicing is suspended while the database if flushing.

...

  • there tends to be n*2+1 db files that comprise the db tree, where 'n' is the number of vms configured on the system (vm config + vm domstore db).


This combination is problematic on systems with slower disks, as the db flush may take a significant amount of time if the system is under heavy load and/or has a large number of db files in the tree (causing db read/write accesses to either fail by timing out, or just take a while).

...

I have prototyped a dbd replacement (QTDBD) in C++/QT which addresses these issues.  Notably  it was designed with two key features in mind:-

  • multithreaded operation: handle dbus requests and maintain the json tree separately from the thread flushing databases to disk.

...

  • minimize disk writes: only write out db files that are "dirty", instead of flushing out all db files on any db write.  The timing dbd currently uses to combine flushes to disk is maintained, but is configurable via command line options (default 3000ms wait period before flushing).


QTDBD has demonstrated improved system performance under heavy load tests including continuous synchronization on a system with 30+ VMs, with reasonable sized dom-stores.

...

If you'd like to see this performance hit, build openxt with dbd-perftest included.  Run dbd-perftest, create vms to compare against, and run again:

# write to db every 1000ms while reading quickly, up to 50k iterations
dbd-perftest -w 1000 -r 0.00001 -i 50000
 
# create 40 vms and populate all domstores with a single key/value
for x in $(seq 1 40); do xec create-vm-with-template new-vm-sync; done
for x in $(xec list-vms); do xec-vm -o $x set-domstore-key foo bar; done
 
# retest
dbd-perftest -w 1000 -r 0.00001 -i 50000

 

With qtdbd, the results should remain consistent between the two numbers.  However, if you run the old dbd (copy from another openxt build - killall dbd and then run /tmp/dbd.old), you will see dramatic slowdown, particularly on systems w/o SSDs.

Sample results for two systems (one with SSD and another with typical HDD):

[Current DBD - HDD System]
2 VMs: 2532 reads/sec
42 VMs: <1 read/sec

[Current DBD - SSD System]
2 VMs: 3216 reads/sec
42 VMs: 2286 reads/sec

[QTDBD - HDD System]
2 VMs: 2593 reads/sec
42 VMs: 2504 reads/sec

[QTDBD - SSD System]
2 VMs: 3384 reads/sec
42 VMs: 3336 reads/sec

...

OpenXT destabilizes when db writes happen if you have a larger number of VMs.  If you have a system (without SSD) handy, feel free to try it for yourself:

time db-read foo
for x in $(seq 1 40); do xec create-vm-with-template new-vm-sync; done
for x in $(xec list-vms); do xec-vm -o $x set-domstore-key foo bar; done
db-write foo bar
time db-read foo
 

What is the db update/flush timer?

Currently, dbd defines its update interval at 3 seconds. (see https://github.com/OpenXT/manager/blob/master/dbd/dbd.ml#L29)  QTDBD gives you the option to configure whatever you want in /etc/init.d/dbd, via a command line option.

...

What changes are required, and where?


openxt.git changes

URL: https://github.com/cjp256/openxt/tree/openxt-qt5
Summary of changes:-

  • add meta-qt5 layer to bblayers

...

  • backport an openembedded-core fix to enable meta-qt5 to build OK (this can be removed after moving to a recent OE version such as jethro)

meta-qt5.git changes

URL: https://github.com/cjp256/meta-qt5/tree/master
Summary of changes:-

  • adds meta-qt5 layer from upstream

...

  • add two commits with fixes required to build on OpenXT's current OE version (this local repo could be dropped after moving to newer OE, and instead point to upstream)

qtdbd.git (new)

URL: https://github.com/cjp256/qtdbd/tree/master
Summary:-

  • provides compatible equivalent to dbd (no additional APIs were added)

...

  • provides db-tools with compatible equivalents:

      ...

        • db-cat (which is actually a dud)

      ...

        • db-exists

      ...

        • db-ls

      ...

        • db-nodes

      ...

        • db-read

      ...

        • db-rm

      ...

        • db-write

      ...

      • provides a couple new command line utilities to expose existing APIs:

          ...

            • db-dump

          ...

            • db-inject

          ...

          • uses upstream qt5 (qtbase) and qmjson libraries

          ...

          • does not replace upgrade-db (existing one is retained as-is)

          ...

          • adds a test utility to perform automatic regression tests

          ...

          • (optional) integrated with travis CI to perform automatic build & regression tests on opened pull requests and pushes.

          xenclient-oe.git changes

          URL: https://github.com/cjp256/xenclient-oe/tree/qtdbd
          Summary of changes:-

          • replace dom0's dbd with qtdbd's equivalent

          ...

          • replace dom0, syncvm, and ndvm db-tools with those provided by qtdbd

          ...

          • add dump() to rpc proxy firewall rules for ndvm, because it uses db-ls. qtdbd's provided db-ls uses dump() instead of the old style for efficiency purposes.  Note that this does not grant any additional access for the ndvm.

          ...

          • add recipe for qmjson

          ...

          • add configuration bbappend for qtbase

          ...

          • shift v4v wrappers to qtdbd recipe

          How To Test

          ...

          Functional/unit tests:

          QTDBD ships with a functional testing binary, which it runs through a number of tests to exercise the database.  There is no such unit test / application for the current dbd that could be ported over. 

          To run these locally, clone the qtdbd repo:

          ...