First of all, StorPerf is not going to give you pointers on how to tune your Cinder implementation, as there are far too many backends (Ceph, NFS, LVM, etc), each with their own methods of tuning. StorPerf is here to assist in getting a reliable performance measurement by encoding the test specification from SNIA, and helping present the results in a way that makes sense.
Having said that, there are some general guidelines that we can present to assist with planning a performance test.
This is an important item to address as there are many parameters to how data is accessed. Databases typically use a fixed block size and tend to manage their data so that sequential access is more likely. GPS image tiles can be around 20-60kb and will be accessed by reading the file in full, with no easy way to predict what tiles will be needed next. Some programs are able to submit I/O asynchronously where others need to have different threads and may be synchronous. There is no one size fits all here, so knowing what type of I/O pattern you need to model is critical to getting realistic measurements.
The unfortunate part is that StorPerf does not have any knowledge about the underlying OpenStack itself – we can only see what is available through OpenStack APIs, and none of them provide details about the underlying storage implementation. As the test executor, you will need to know information such as: the number of disks or storage nodes; the amount of RAM available for caching; the type of connection to the storage and bandwidth available.
As part of the test data size, you need to ensure that you are preventing caching from interfering in your measurements. The total size of the data set in the test must exceed the total size of all the disk cache memory available by a certain amount in order to ensure we are forcing non-cached I/O. There is no exact science here, but if we balance test duration against cache hit ratio, it can be argued that 20% cache hit is good enough and increasing file size would result in diminishing returns. Let’s break this number down a bit. Given a cache size of 10GB, we could write, then read the following dataset sizes:
This means that for the first test, 100% of the results are unreliable due to cache. At 50GB, the true performance without cache has only a 20% margin of error. Given the fact that the 100GB would take twice as long, and that we are only reducing the margin of error by 10%, we recommend this as the best tradeoff.
How much cache do we actually have? This depends on the storage device being used. For hardware NAS or other arrays, it should be fairly easy to get the number from the manufacturer, but for software defined storage, it can be harder to determine. Let’s take Ceph as an example. Ceph runs as software on the bare metal server and therefore has access to all the RAM available on the server to use as its cache. Well, not exactly all the memory. We have to take into account the memory consumed by the operating system, by the Ceph processes, as well as any other processes running on the same system. In the case of hyper-converged Ceph, where workload VMs and Ceph run on the systems, it can become quite difficult to predict. Ultimately, the amount of memory that is left over is the cache for that single Ceph instance. We now need to add the memory available from all the other Ceph storage nodes in the environment. Time for another example: given 3 Ceph storage nodes with 256GB RAM each. Let’s take 20% off to pin to the OS and other processes, leaving approximately 240GB per node This gives us 3 x 240 or 720GB total RAM available for cache. The total amount of data we want to write in order to initialize our Cinder volumes would then be 5 x 720, or 3,600 GB. The following illustrates some ways to allocate the data:
Now that we know there is 3.6 TB of data to be written, we need to go back to the workload model to determine how we are going to write it. Factors to consider:
Once we have the information gathered, we can now start executing some tests. Let’s take some of the points discussed above and describe our system:
The first thing we know is that we want to keep our cache hit ratio around 20%, so we will be moving 3,600 GB of data. We also know this will take a significant amount of time, so here is where StorPerf helps.
First, we use the configurations API to launch our 10 virtual machines each with a 360 GB volume. Next comes the most time consuming part, we call the initializations API to fill each one of these volumes with random data. By preloading the data, we ensure a number of things:
This last part is important as we can now use StorPerf’s implementation of SNIA’s steady state algorithm to ensure our follow up tests execute as quickly as possible. Given the fact that 80% of the data in any given test results in a cache miss, we can run multiple tests in a row without having to re-initialize or invalidate the cache again in between test runs. We can also mix and match the types of workloads to be run in a single performance job submission.
Now we can submit a job to the jobs API to execute a 70%/30% mix of read/write, with a block size of 4k and an I/O queue depth of 6. This job will run until either the maximum time has expired, or until StorPerf detects steady state has been reached, at which point it will immediately complete and report the results of the measurements.
As a closing note, StorPerf does use FIO as its workload engine, so whatever workload parameters we would like to use with FIO can be passed directly through via StorPerf’s jobs API.