Rename Performance and tuning

This now uses title case. Signed-off-by: Richard Laager <rlaager@wiktel.com>
2020-05-25 01:49:20 -05:00
parent 81cc030b32
commit 999405d73c
6 changed files with 3 additions and 3 deletions
--- a/docs/Performance
+++ b/docs/Performance
@@ -0,0 +1,36 @@
+Async Writes
+============
+
+The number of concurrent operations issued for the async write I/O class
+follows a piece-wise linear function defined by a few adjustable points.
+
+::
+
+          |              o---------| <-- zfs_vdev_async_write_max_active
+     ^    |             /^         |
+     |    |            / |         |
+   active |           /  |         |
+    I/O   |          /   |         |
+   count  |         /    |         |
+          |        /     |         |
+          |-------o      |         | <-- zfs_vdev_async_write_min_active
+         0|_______^______|_________|
+          0%      |      |       100% of zfs_dirty_data_max
+                  |      |
+                  |      `-- zfs_vdev_async_write_active_max_dirty_percent
+                  `--------- zfs_vdev_async_write_active_min_dirty_percent
+
+Until the amount of dirty data exceeds a minimum percentage of the dirty
+data allowed in the pool, the I/O scheduler will limit the number of
+concurrent operations to the minimum. As that threshold is crossed, the
+number of concurrent operations issued increases linearly to the maximum
+at the specified maximum percentage of the dirty data allowed in the
+pool.
+
+Ideally, the amount of dirty data on a busy pool will stay in the sloped
+part of the function between
+zfs_vdev_async_write_active_min_dirty_percent and
+zfs_vdev_async_write_active_max_dirty_percent. If it exceeds the maximum
+percentage, this indicates that the rate of incoming data is greater
+than the rate that the backend storage can handle. In this case, we must
+further throttle incoming writes, as described in the next section.
--- a/docs/Performance
+++ b/docs/Performance
@@ -0,0 +1,105 @@
+ZFS Transaction Delay
+=====================
+
+ZFS write operations are delayed when the backend storage isn't able to
+accommodate the rate of incoming writes. This delay process is known as
+the ZFS write throttle.
+
+If there is already a write transaction waiting, the delay is relative
+to when that transaction will finish waiting. Thus the calculated delay
+time is independent of the number of threads concurrently executing
+transactions.
+
+If there is only one waiter, the delay is relative to when the
+transaction started, rather than the current time. This credits the
+transaction for "time already served." For example, if a write
+transaction requires reading indirect blocks first, then the delay is
+counted at the start of the transaction, just prior to the indirect
+block reads.
+
+The minimum time for a transaction to take is calculated as:
+
+::
+
+   min_time = zfs_delay_scale * (dirty - min) / (max - dirty)
+   min_time is then capped at 100 milliseconds
+
+The delay has two degrees of freedom that can be adjusted via tunables:
+
+1. The percentage of dirty data at which we start to delay is defined by
+   zfs_delay_min_dirty_percent. This is typically be at or above
+   zfs_vdev_async_write_active_max_dirty_percent so delays occur after
+   writing at full speed has failed to keep up with the incoming write
+   rate.
+2. The scale of the curve is defined by zfs_delay_scale. Roughly
+   speaking, this variable determines the amount of delay at the
+   midpoint of the curve.
+
+::
+
+   delay
+    10ms +-------------------------------------------------------------*+
+         |                                                             *|
+     9ms +                                                             *+
+         |                                                             *|
+     8ms +                                                             *+
+         |                                                            * |
+     7ms +                                                            * +
+         |                                                            * |
+     6ms +                                                            * +
+         |                                                            * |
+     5ms +                                                           *  +
+         |                                                           *  |
+     4ms +                                                           *  +
+         |                                                           *  |
+     3ms +                                                          *   +
+         |                                                          *   |
+     2ms +                                              (midpoint) *    +
+         |                                                  |    **     |
+     1ms +                                                  v ***       +
+         |             zfs_delay_scale ---------->     ********         |
+       0 +-------------------------------------*********----------------+
+         0%                    <- zfs_dirty_data_max ->               100%
+
+Note that since the delay is added to the outstanding time remaining on
+the most recent transaction, the delay is effectively the inverse of
+IOPS. Here the midpoint of 500 microseconds translates to 2000 IOPS. The
+shape of the curve was chosen such that small changes in the amount of
+accumulated dirty data in the first 3/4 of the curve yield relatively
+small differences in the amount of delay.
+
+The effects can be easier to understand when the amount of delay is
+represented on a log scale:
+
+::
+
+   delay
+   100ms +-------------------------------------------------------------++
+         +                                                              +
+         |                                                              |
+         +                                                             *+
+    10ms +                                                             *+
+         +                                                           ** +
+         |                                              (midpoint)  **  |
+         +                                                  |     **    +
+     1ms +                                                  v ****      +
+         +             zfs_delay_scale ---------->        *****         +
+         |                                             ****             |
+         +                                          ****                +
+   100us +                                        **                    +
+         +                                       *                      +
+         |                                      *                       |
+         +                                     *                        +
+    10us +                                     *                        +
+         +                                                              +
+         |                                                              |
+         +                                                              +
+         +--------------------------------------------------------------+
+         0%                    <- zfs_dirty_data_max ->               100%
+
+Note here that only as the amount of dirty data approaches its limit
+does the delay start to increase rapidly. The goal of a properly tuned
+system should be to keep the amount of dirty data out of that range by
+first ensuring that the appropriate limits are set for the I/O scheduler
+to reach optimal throughput on the backend storage, and then by changing
+the value of zfs_delay_scale to increase the steepness of the curve.
--- a/docs/Performance
+++ b/docs/Performance
--- a/docs/Performance
+++ b/docs/Performance
@@ -0,0 +1,98 @@
+ZFS I/O (ZIO) Scheduler
+=======================
+
+ZFS issues I/O operations to leaf vdevs (usually devices) to satisfy and
+complete I/Os. The ZIO scheduler determines when and in what order those
+operations are issued. Operations into five I/O classes prioritized in
+the following order:
+
+----------+-------------+-------------------------------------------+
+| Priority | I/O Class   | Description                               |
+==========+=============+===========================================+
+| highest  | sync read   | most reads                                |
+----------+-------------+-------------------------------------------+
+|          | sync write  | as defined by application or via 'zfs'    |
+|          |             | 'sync' property                           |
+----------+-------------+-------------------------------------------+
+|          | async read  | prefetch reads                            |
+----------+-------------+-------------------------------------------+
+|          | async write | most writes                               |
+----------+-------------+-------------------------------------------+
+| lowest   | scrub read  | scan read: includes both scrub and        |
+|          |             | resilver                                  |
+----------+-------------+-------------------------------------------+
+
+Each queue defines the minimum and maximum number of concurrent
+operations issued to the device. In addition, the device has an
+aggregate maximum, zfs_vdev_max_active. Note that the sum of the
+per-queue minimums must not exceed the aggregate maximum. If the sum of
+the per-queue maximums exceeds the aggregate maximum, then the number of
+active I/Os may reach zfs_vdev_max_active, in which case no further I/Os
+are issued regardless of whether all per-queue minimums have been met.
+
+-------------+--------------------------+--------------------------+
+| I/O Class   | Min Active Parameter     | Max Active Parameter     |
+=============+==========================+==========================+
+| sync read   | zfs_v                    | zfs_v                    |
+|             | dev_sync_read_min_active | dev_sync_read_max_active |
+-------------+--------------------------+--------------------------+
+| sync write  | zfs_vd                   | zfs_vd                   |
+|             | ev_sync_write_min_active | ev_sync_write_max_active |
+-------------+--------------------------+--------------------------+
+| async read  | zfs_vd                   | zfs_vd                   |
+|             | ev_async_read_min_active | ev_async_read_max_active |
+-------------+--------------------------+--------------------------+
+| async write | zfs_vde                  | zfs_vde                  |
+|             | v_async_write_min_active | v_async_write_max_active |
+-------------+--------------------------+--------------------------+
+| scrub read  | z                        | z                        |
+|             | fs_vdev_scrub_min_active | fs_vdev_scrub_max_active |
+-------------+--------------------------+--------------------------+
+
+For many physical devices, throughput increases with the number of
+concurrent operations, but latency typically suffers. Further, physical
+devices typically have a limit at which more concurrent operations have
+no effect on throughput or can actually cause it to performance to
+decrease.
+
+The ZIO scheduler selects the next operation to issue by first looking
+for an I/O class whose minimum has not been satisfied. Once all are
+satisfied and the aggregate maximum has not been hit, the scheduler
+looks for classes whose maximum has not been satisfied. Iteration
+through the I/O classes is done in the order specified above. No further
+operations are issued if the aggregate maximum number of concurrent
+operations has been hit or if there are no operations queued for an I/O
+class that has not hit its maximum. Every time an I/O is queued or an
+operation completes, the I/O scheduler looks for new operations to
+issue.
+
+In general, smaller max_active's will lead to lower latency of
+synchronous operations. Larger max_active's may lead to higher overall
+throughput, depending on underlying storage and the I/O mix.
+
+The ratio of the queues' max_actives determines the balance of
+performance between reads, writes, and scrubs. For example, when there
+is contention, increasing zfs_vdev_scrub_max_active will cause the scrub
+or resilver to complete more quickly, but reads and writes to have
+higher latency and lower throughput.
+
+All I/O classes have a fixed maximum number of outstanding operations
+except for the async write class. Asynchronous writes represent the data
+that is committed to stable storage during the syncing stage for
+transaction groups (txgs). Transaction groups enter the syncing state
+periodically so the number of queued async writes quickly bursts up and
+then reduce down to zero. The zfs_txg_timeout tunable (default=5
+seconds) sets the target interval for txg sync. Thus a burst of async
+writes every 5 seconds is a normal ZFS I/O pattern.
+
+Rather than servicing I/Os as quickly as possible, the ZIO scheduler
+changes the maximum number of active async write I/Os according to the
+amount of dirty data in the pool. Since both throughput and latency
+typically increase as the number of concurrent operations issued to
+physical devices, reducing the burstiness in the number of concurrent
+operations also stabilizes the response time of operations from other
+queues. This is particular important for the sync read and write queues,
+where the periodic async write bursts of the txg sync can lead to
+device-level contention. In broad strokes, the ZIO scheduler issues more
+concurrent operations from the async write queue as there's more dirty
+data in the pool.
--- a/Tuning/index.rst
+++ b/Tuning/index.rst
@@ -0,0 +1,9 @@
+Performance and Tuning
+======================
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Contents:
+   :glob:
+
+   *