Rename Performance and tuning

This now uses title case.

Signed-off-by: Richard Laager <rlaager@wiktel.com>
This commit is contained in:
Richard Laager
2020-05-25 01:49:20 -05:00
parent 81cc030b32
commit 999405d73c
6 changed files with 3 additions and 3 deletions

View File

@@ -0,0 +1,36 @@
Async Writes
============
The number of concurrent operations issued for the async write I/O class
follows a piece-wise linear function defined by a few adjustable points.
::
| o---------| <-- zfs_vdev_async_write_max_active
^ | /^ |
| | / | |
active | / | |
I/O | / | |
count | / | |
| / | |
|-------o | | <-- zfs_vdev_async_write_min_active
0|_______^______|_________|
0% | | 100% of zfs_dirty_data_max
| |
| `-- zfs_vdev_async_write_active_max_dirty_percent
`--------- zfs_vdev_async_write_active_min_dirty_percent
Until the amount of dirty data exceeds a minimum percentage of the dirty
data allowed in the pool, the I/O scheduler will limit the number of
concurrent operations to the minimum. As that threshold is crossed, the
number of concurrent operations issued increases linearly to the maximum
at the specified maximum percentage of the dirty data allowed in the
pool.
Ideally, the amount of dirty data on a busy pool will stay in the sloped
part of the function between
zfs_vdev_async_write_active_min_dirty_percent and
zfs_vdev_async_write_active_max_dirty_percent. If it exceeds the maximum
percentage, this indicates that the rate of incoming data is greater
than the rate that the backend storage can handle. In this case, we must
further throttle incoming writes, as described in the next section.

View File

@@ -0,0 +1,105 @@
ZFS Transaction Delay
=====================
ZFS write operations are delayed when the backend storage isn't able to
accommodate the rate of incoming writes. This delay process is known as
the ZFS write throttle.
If there is already a write transaction waiting, the delay is relative
to when that transaction will finish waiting. Thus the calculated delay
time is independent of the number of threads concurrently executing
transactions.
If there is only one waiter, the delay is relative to when the
transaction started, rather than the current time. This credits the
transaction for "time already served." For example, if a write
transaction requires reading indirect blocks first, then the delay is
counted at the start of the transaction, just prior to the indirect
block reads.
The minimum time for a transaction to take is calculated as:
::
min_time = zfs_delay_scale * (dirty - min) / (max - dirty)
min_time is then capped at 100 milliseconds
The delay has two degrees of freedom that can be adjusted via tunables:
1. The percentage of dirty data at which we start to delay is defined by
zfs_delay_min_dirty_percent. This is typically be at or above
zfs_vdev_async_write_active_max_dirty_percent so delays occur after
writing at full speed has failed to keep up with the incoming write
rate.
2. The scale of the curve is defined by zfs_delay_scale. Roughly
speaking, this variable determines the amount of delay at the
midpoint of the curve.
::
delay
10ms +-------------------------------------------------------------*+
| *|
9ms + *+
| *|
8ms + *+
| * |
7ms + * +
| * |
6ms + * +
| * |
5ms + * +
| * |
4ms + * +
| * |
3ms + * +
| * |
2ms + (midpoint) * +
| | ** |
1ms + v *** +
| zfs_delay_scale ----------> ******** |
0 +-------------------------------------*********----------------+
0% <- zfs_dirty_data_max -> 100%
Note that since the delay is added to the outstanding time remaining on
the most recent transaction, the delay is effectively the inverse of
IOPS. Here the midpoint of 500 microseconds translates to 2000 IOPS. The
shape of the curve was chosen such that small changes in the amount of
accumulated dirty data in the first 3/4 of the curve yield relatively
small differences in the amount of delay.
The effects can be easier to understand when the amount of delay is
represented on a log scale:
::
delay
100ms +-------------------------------------------------------------++
+ +
| |
+ *+
10ms + *+
+ ** +
| (midpoint) ** |
+ | ** +
1ms + v **** +
+ zfs_delay_scale ----------> ***** +
| **** |
+ **** +
100us + ** +
+ * +
| * |
+ * +
10us + * +
+ +
| |
+ +
+--------------------------------------------------------------+
0% <- zfs_dirty_data_max -> 100%
Note here that only as the amount of dirty data approaches its limit
does the delay start to increase rapidly. The goal of a properly tuned
system should be to keep the amount of dirty data out of that range by
first ensuring that the appropriate limits are set for the I/O scheduler
to reach optimal throughput on the backend storage, and then by changing
the value of zfs_delay_scale to increase the steepness of the curve.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,98 @@
ZFS I/O (ZIO) Scheduler
=======================
ZFS issues I/O operations to leaf vdevs (usually devices) to satisfy and
complete I/Os. The ZIO scheduler determines when and in what order those
operations are issued. Operations into five I/O classes prioritized in
the following order:
+----------+-------------+-------------------------------------------+
| Priority | I/O Class | Description |
+==========+=============+===========================================+
| highest | sync read | most reads |
+----------+-------------+-------------------------------------------+
| | sync write | as defined by application or via 'zfs' |
| | | 'sync' property |
+----------+-------------+-------------------------------------------+
| | async read | prefetch reads |
+----------+-------------+-------------------------------------------+
| | async write | most writes |
+----------+-------------+-------------------------------------------+
| lowest | scrub read | scan read: includes both scrub and |
| | | resilver |
+----------+-------------+-------------------------------------------+
Each queue defines the minimum and maximum number of concurrent
operations issued to the device. In addition, the device has an
aggregate maximum, zfs_vdev_max_active. Note that the sum of the
per-queue minimums must not exceed the aggregate maximum. If the sum of
the per-queue maximums exceeds the aggregate maximum, then the number of
active I/Os may reach zfs_vdev_max_active, in which case no further I/Os
are issued regardless of whether all per-queue minimums have been met.
+-------------+--------------------------+--------------------------+
| I/O Class | Min Active Parameter | Max Active Parameter |
+=============+==========================+==========================+
| sync read | zfs_v | zfs_v |
| | dev_sync_read_min_active | dev_sync_read_max_active |
+-------------+--------------------------+--------------------------+
| sync write | zfs_vd | zfs_vd |
| | ev_sync_write_min_active | ev_sync_write_max_active |
+-------------+--------------------------+--------------------------+
| async read | zfs_vd | zfs_vd |
| | ev_async_read_min_active | ev_async_read_max_active |
+-------------+--------------------------+--------------------------+
| async write | zfs_vde | zfs_vde |
| | v_async_write_min_active | v_async_write_max_active |
+-------------+--------------------------+--------------------------+
| scrub read | z | z |
| | fs_vdev_scrub_min_active | fs_vdev_scrub_max_active |
+-------------+--------------------------+--------------------------+
For many physical devices, throughput increases with the number of
concurrent operations, but latency typically suffers. Further, physical
devices typically have a limit at which more concurrent operations have
no effect on throughput or can actually cause it to performance to
decrease.
The ZIO scheduler selects the next operation to issue by first looking
for an I/O class whose minimum has not been satisfied. Once all are
satisfied and the aggregate maximum has not been hit, the scheduler
looks for classes whose maximum has not been satisfied. Iteration
through the I/O classes is done in the order specified above. No further
operations are issued if the aggregate maximum number of concurrent
operations has been hit or if there are no operations queued for an I/O
class that has not hit its maximum. Every time an I/O is queued or an
operation completes, the I/O scheduler looks for new operations to
issue.
In general, smaller max_active's will lead to lower latency of
synchronous operations. Larger max_active's may lead to higher overall
throughput, depending on underlying storage and the I/O mix.
The ratio of the queues' max_actives determines the balance of
performance between reads, writes, and scrubs. For example, when there
is contention, increasing zfs_vdev_scrub_max_active will cause the scrub
or resilver to complete more quickly, but reads and writes to have
higher latency and lower throughput.
All I/O classes have a fixed maximum number of outstanding operations
except for the async write class. Asynchronous writes represent the data
that is committed to stable storage during the syncing stage for
transaction groups (txgs). Transaction groups enter the syncing state
periodically so the number of queued async writes quickly bursts up and
then reduce down to zero. The zfs_txg_timeout tunable (default=5
seconds) sets the target interval for txg sync. Thus a burst of async
writes every 5 seconds is a normal ZFS I/O pattern.
Rather than servicing I/Os as quickly as possible, the ZIO scheduler
changes the maximum number of active async write I/Os according to the
amount of dirty data in the pool. Since both throughput and latency
typically increase as the number of concurrent operations issued to
physical devices, reducing the burstiness in the number of concurrent
operations also stabilizes the response time of operations from other
queues. This is particular important for the sync read and write queues,
where the periodic async write bursts of the txg sync can lead to
device-level contention. In broad strokes, the ZIO scheduler issues more
concurrent operations from the async write queue as there's more dirty
data in the pool.

View File

@@ -0,0 +1,9 @@
Performance and Tuning
======================
.. toctree::
:maxdepth: 2
:caption: Contents:
:glob:
*