Rename Performance and tuning
This now uses title case. Signed-off-by: Richard Laager <rlaager@wiktel.com>
This commit is contained in:
36
docs/Performance and Tuning/Async Write.rst
Normal file
36
docs/Performance and Tuning/Async Write.rst
Normal file
@@ -0,0 +1,36 @@
|
||||
Async Writes
|
||||
============
|
||||
|
||||
The number of concurrent operations issued for the async write I/O class
|
||||
follows a piece-wise linear function defined by a few adjustable points.
|
||||
|
||||
::
|
||||
|
||||
| o---------| <-- zfs_vdev_async_write_max_active
|
||||
^ | /^ |
|
||||
| | / | |
|
||||
active | / | |
|
||||
I/O | / | |
|
||||
count | / | |
|
||||
| / | |
|
||||
|-------o | | <-- zfs_vdev_async_write_min_active
|
||||
0|_______^______|_________|
|
||||
0% | | 100% of zfs_dirty_data_max
|
||||
| |
|
||||
| `-- zfs_vdev_async_write_active_max_dirty_percent
|
||||
`--------- zfs_vdev_async_write_active_min_dirty_percent
|
||||
|
||||
Until the amount of dirty data exceeds a minimum percentage of the dirty
|
||||
data allowed in the pool, the I/O scheduler will limit the number of
|
||||
concurrent operations to the minimum. As that threshold is crossed, the
|
||||
number of concurrent operations issued increases linearly to the maximum
|
||||
at the specified maximum percentage of the dirty data allowed in the
|
||||
pool.
|
||||
|
||||
Ideally, the amount of dirty data on a busy pool will stay in the sloped
|
||||
part of the function between
|
||||
zfs_vdev_async_write_active_min_dirty_percent and
|
||||
zfs_vdev_async_write_active_max_dirty_percent. If it exceeds the maximum
|
||||
percentage, this indicates that the rate of incoming data is greater
|
||||
than the rate that the backend storage can handle. In this case, we must
|
||||
further throttle incoming writes, as described in the next section.
|
||||
105
docs/Performance and Tuning/ZFS Transaction Delay.rst
Normal file
105
docs/Performance and Tuning/ZFS Transaction Delay.rst
Normal file
@@ -0,0 +1,105 @@
|
||||
ZFS Transaction Delay
|
||||
=====================
|
||||
|
||||
ZFS write operations are delayed when the backend storage isn't able to
|
||||
accommodate the rate of incoming writes. This delay process is known as
|
||||
the ZFS write throttle.
|
||||
|
||||
If there is already a write transaction waiting, the delay is relative
|
||||
to when that transaction will finish waiting. Thus the calculated delay
|
||||
time is independent of the number of threads concurrently executing
|
||||
transactions.
|
||||
|
||||
If there is only one waiter, the delay is relative to when the
|
||||
transaction started, rather than the current time. This credits the
|
||||
transaction for "time already served." For example, if a write
|
||||
transaction requires reading indirect blocks first, then the delay is
|
||||
counted at the start of the transaction, just prior to the indirect
|
||||
block reads.
|
||||
|
||||
The minimum time for a transaction to take is calculated as:
|
||||
|
||||
::
|
||||
|
||||
min_time = zfs_delay_scale * (dirty - min) / (max - dirty)
|
||||
min_time is then capped at 100 milliseconds
|
||||
|
||||
The delay has two degrees of freedom that can be adjusted via tunables:
|
||||
|
||||
1. The percentage of dirty data at which we start to delay is defined by
|
||||
zfs_delay_min_dirty_percent. This is typically be at or above
|
||||
zfs_vdev_async_write_active_max_dirty_percent so delays occur after
|
||||
writing at full speed has failed to keep up with the incoming write
|
||||
rate.
|
||||
2. The scale of the curve is defined by zfs_delay_scale. Roughly
|
||||
speaking, this variable determines the amount of delay at the
|
||||
midpoint of the curve.
|
||||
|
||||
::
|
||||
|
||||
delay
|
||||
10ms +-------------------------------------------------------------*+
|
||||
| *|
|
||||
9ms + *+
|
||||
| *|
|
||||
8ms + *+
|
||||
| * |
|
||||
7ms + * +
|
||||
| * |
|
||||
6ms + * +
|
||||
| * |
|
||||
5ms + * +
|
||||
| * |
|
||||
4ms + * +
|
||||
| * |
|
||||
3ms + * +
|
||||
| * |
|
||||
2ms + (midpoint) * +
|
||||
| | ** |
|
||||
1ms + v *** +
|
||||
| zfs_delay_scale ----------> ******** |
|
||||
0 +-------------------------------------*********----------------+
|
||||
0% <- zfs_dirty_data_max -> 100%
|
||||
|
||||
Note that since the delay is added to the outstanding time remaining on
|
||||
the most recent transaction, the delay is effectively the inverse of
|
||||
IOPS. Here the midpoint of 500 microseconds translates to 2000 IOPS. The
|
||||
shape of the curve was chosen such that small changes in the amount of
|
||||
accumulated dirty data in the first 3/4 of the curve yield relatively
|
||||
small differences in the amount of delay.
|
||||
|
||||
The effects can be easier to understand when the amount of delay is
|
||||
represented on a log scale:
|
||||
|
||||
::
|
||||
|
||||
delay
|
||||
100ms +-------------------------------------------------------------++
|
||||
+ +
|
||||
| |
|
||||
+ *+
|
||||
10ms + *+
|
||||
+ ** +
|
||||
| (midpoint) ** |
|
||||
+ | ** +
|
||||
1ms + v **** +
|
||||
+ zfs_delay_scale ----------> ***** +
|
||||
| **** |
|
||||
+ **** +
|
||||
100us + ** +
|
||||
+ * +
|
||||
| * |
|
||||
+ * +
|
||||
10us + * +
|
||||
+ +
|
||||
| |
|
||||
+ +
|
||||
+--------------------------------------------------------------+
|
||||
0% <- zfs_dirty_data_max -> 100%
|
||||
|
||||
Note here that only as the amount of dirty data approaches its limit
|
||||
does the delay start to increase rapidly. The goal of a properly tuned
|
||||
system should be to keep the amount of dirty data out of that range by
|
||||
first ensuring that the appropriate limits are set for the I/O scheduler
|
||||
to reach optimal throughput on the backend storage, and then by changing
|
||||
the value of zfs_delay_scale to increase the steepness of the curve.
|
||||
9346
docs/Performance and Tuning/ZFS on Linux Module Parameters.rst
Normal file
9346
docs/Performance and Tuning/ZFS on Linux Module Parameters.rst
Normal file
File diff suppressed because it is too large
Load Diff
98
docs/Performance and Tuning/ZIO Scheduler.rst
Normal file
98
docs/Performance and Tuning/ZIO Scheduler.rst
Normal file
@@ -0,0 +1,98 @@
|
||||
ZFS I/O (ZIO) Scheduler
|
||||
=======================
|
||||
|
||||
ZFS issues I/O operations to leaf vdevs (usually devices) to satisfy and
|
||||
complete I/Os. The ZIO scheduler determines when and in what order those
|
||||
operations are issued. Operations into five I/O classes prioritized in
|
||||
the following order:
|
||||
|
||||
+----------+-------------+-------------------------------------------+
|
||||
| Priority | I/O Class | Description |
|
||||
+==========+=============+===========================================+
|
||||
| highest | sync read | most reads |
|
||||
+----------+-------------+-------------------------------------------+
|
||||
| | sync write | as defined by application or via 'zfs' |
|
||||
| | | 'sync' property |
|
||||
+----------+-------------+-------------------------------------------+
|
||||
| | async read | prefetch reads |
|
||||
+----------+-------------+-------------------------------------------+
|
||||
| | async write | most writes |
|
||||
+----------+-------------+-------------------------------------------+
|
||||
| lowest | scrub read | scan read: includes both scrub and |
|
||||
| | | resilver |
|
||||
+----------+-------------+-------------------------------------------+
|
||||
|
||||
Each queue defines the minimum and maximum number of concurrent
|
||||
operations issued to the device. In addition, the device has an
|
||||
aggregate maximum, zfs_vdev_max_active. Note that the sum of the
|
||||
per-queue minimums must not exceed the aggregate maximum. If the sum of
|
||||
the per-queue maximums exceeds the aggregate maximum, then the number of
|
||||
active I/Os may reach zfs_vdev_max_active, in which case no further I/Os
|
||||
are issued regardless of whether all per-queue minimums have been met.
|
||||
|
||||
+-------------+--------------------------+--------------------------+
|
||||
| I/O Class | Min Active Parameter | Max Active Parameter |
|
||||
+=============+==========================+==========================+
|
||||
| sync read | zfs_v | zfs_v |
|
||||
| | dev_sync_read_min_active | dev_sync_read_max_active |
|
||||
+-------------+--------------------------+--------------------------+
|
||||
| sync write | zfs_vd | zfs_vd |
|
||||
| | ev_sync_write_min_active | ev_sync_write_max_active |
|
||||
+-------------+--------------------------+--------------------------+
|
||||
| async read | zfs_vd | zfs_vd |
|
||||
| | ev_async_read_min_active | ev_async_read_max_active |
|
||||
+-------------+--------------------------+--------------------------+
|
||||
| async write | zfs_vde | zfs_vde |
|
||||
| | v_async_write_min_active | v_async_write_max_active |
|
||||
+-------------+--------------------------+--------------------------+
|
||||
| scrub read | z | z |
|
||||
| | fs_vdev_scrub_min_active | fs_vdev_scrub_max_active |
|
||||
+-------------+--------------------------+--------------------------+
|
||||
|
||||
For many physical devices, throughput increases with the number of
|
||||
concurrent operations, but latency typically suffers. Further, physical
|
||||
devices typically have a limit at which more concurrent operations have
|
||||
no effect on throughput or can actually cause it to performance to
|
||||
decrease.
|
||||
|
||||
The ZIO scheduler selects the next operation to issue by first looking
|
||||
for an I/O class whose minimum has not been satisfied. Once all are
|
||||
satisfied and the aggregate maximum has not been hit, the scheduler
|
||||
looks for classes whose maximum has not been satisfied. Iteration
|
||||
through the I/O classes is done in the order specified above. No further
|
||||
operations are issued if the aggregate maximum number of concurrent
|
||||
operations has been hit or if there are no operations queued for an I/O
|
||||
class that has not hit its maximum. Every time an I/O is queued or an
|
||||
operation completes, the I/O scheduler looks for new operations to
|
||||
issue.
|
||||
|
||||
In general, smaller max_active's will lead to lower latency of
|
||||
synchronous operations. Larger max_active's may lead to higher overall
|
||||
throughput, depending on underlying storage and the I/O mix.
|
||||
|
||||
The ratio of the queues' max_actives determines the balance of
|
||||
performance between reads, writes, and scrubs. For example, when there
|
||||
is contention, increasing zfs_vdev_scrub_max_active will cause the scrub
|
||||
or resilver to complete more quickly, but reads and writes to have
|
||||
higher latency and lower throughput.
|
||||
|
||||
All I/O classes have a fixed maximum number of outstanding operations
|
||||
except for the async write class. Asynchronous writes represent the data
|
||||
that is committed to stable storage during the syncing stage for
|
||||
transaction groups (txgs). Transaction groups enter the syncing state
|
||||
periodically so the number of queued async writes quickly bursts up and
|
||||
then reduce down to zero. The zfs_txg_timeout tunable (default=5
|
||||
seconds) sets the target interval for txg sync. Thus a burst of async
|
||||
writes every 5 seconds is a normal ZFS I/O pattern.
|
||||
|
||||
Rather than servicing I/Os as quickly as possible, the ZIO scheduler
|
||||
changes the maximum number of active async write I/Os according to the
|
||||
amount of dirty data in the pool. Since both throughput and latency
|
||||
typically increase as the number of concurrent operations issued to
|
||||
physical devices, reducing the burstiness in the number of concurrent
|
||||
operations also stabilizes the response time of operations from other
|
||||
queues. This is particular important for the sync read and write queues,
|
||||
where the periodic async write bursts of the txg sync can lead to
|
||||
device-level contention. In broad strokes, the ZIO scheduler issues more
|
||||
concurrent operations from the async write queue as there's more dirty
|
||||
data in the pool.
|
||||
9
docs/Performance and Tuning/index.rst
Normal file
9
docs/Performance and Tuning/index.rst
Normal file
@@ -0,0 +1,9 @@
|
||||
Performance and Tuning
|
||||
======================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:caption: Contents:
|
||||
:glob:
|
||||
|
||||
*
|
||||
Reference in New Issue
Block a user