Winery Object Storage Benchmarks
An object storage suitable to store over 100 billions immutable small objects was designed in March 2021. An implementation was completed in January 2022 and includes tools to benchmark it. They will be run against the hardware to be delivered first semester of 2022.
A single process drives the benchmark on a dedicated machine. It runs multiple write workers, each in a separate process (to avoid any lock contention) which report how many objects and bytes they wrote in a CSV file. It runs multiple read workers in parallel, also in separate processes, because they are blocking on reads from RBD images. The read workers report how many objects they read and how many bytes in total in a CSV file. The objects per second and bytes per second and other relevant measures are stored in files every 60 seconds, to be analyzed when the run completes.
A write worker starts by writing objects in the Write Storage, using a payload obtained from a random source to avoid unintended gain due to easily compressible content. When the Shard is full in the Write Storage, a packer process starts and the benchmark will wait until it completes.
A read worker reads a given number of Object ID from the Global Index and gets the associated content from the object storage. It has no way to know if it will be retrieved from the Read Storage or from the Write Storage. As the Read Storage grows the probability grows since the Write Storage has a fixed size.
The benchmark completes after a given number of seconds and the CSV files are copied from the cluster to the host from which the benchmark is run.
- Operating System Debian GNU/Linux bullseye
- Read Storage
- 4+2 erasure coded pool
- Ceph pacific https://docs.ceph.com/en/pacific/rados/operations/erasure-code/
- Write Storage
- A PostgreSQL master server
- A PostgreSQL (cold) standby server replicating the master server
- PosgreSQL 13 https://www.postgresql.org/docs/13/
There are --winery-bench-rw-workers, each of them is in charge of creating a single Shard. When a worker completes, another one is created immediately.
The size of each Shard is at least --winery-shard-max-size bytes and the exact number of objects it contains depends on the random distribution of the object size.
There are --winery-bench-ro-workers processes. Each process picks --winery-bench-ro-worker-max-request Object ID from the Global Index at random, reads its content, verify it is not None and discards it.
Developing the benchmark software and report
The benchmark is part of the test suite so that its code can be conveniently maintained along with it.
It runs in minimal test mode with tox -e winery when the test environment is installed locally with libvirt.
It is also run at scale against grid5000 to sanity check issues that do not show on a small cluster. As of January 2022 the results are in the same order of magnitude as what is expected but the hardware configuration is very different and it cannot be assumed that it will accurately reflect the performances of the target installation.
- Bytes write 26.9 MB/s
- Objects write 1.3 Kobject/s
- Bytes read 78.5 MB/s
- Objects read 3.8 Kobject/s
The hardware provided by grid5000 consists of:
- A Dell S5296F-ON 10GB switch
- 32 Dell PowerEdge C6420
- System: 240 GB SSD SATA Samsung MZ7KM240HMHQ0D3
- Storage: 4.0 TB HDD SATA Seagate ST4000NM0265-2DC
- Intel Xeon Gold 6130 (Skylake, 2.10GHz, 2 CPUs/node, 16 cores/CPU)
- 192 GiB RAM
- One 10GB link to the switch
- 2 Dell PowerEdge R940
- System: 480 GB SSD SATA Intel SSDSC2KG480G7R
- Storage: 2 x 1.6 TB SSD NVME Dell Dell Express Flash NVMe PM1725 1.6TB AIC
- Intel Xeon Gold 6130 (Skylake, 2.10GHz, 4 CPUs/node, 16 cores/CPU)
- 768 GiB RAM
- One 10GB link to the switch
Note: The machines have additional resources (network, disk) but only those used in the context of the benchmark are listed.