Initial wiki md to rst auto convertation
This commit is contained in:
98
docs/ZIO-Scheduler.rst
Normal file
98
docs/ZIO-Scheduler.rst
Normal file
@@ -0,0 +1,98 @@
|
||||
ZFS I/O (ZIO) Scheduler
|
||||
=======================
|
||||
|
||||
ZFS issues I/O operations to leaf vdevs (usually devices) to satisfy and
|
||||
complete I/Os. The ZIO scheduler determines when and in what order those
|
||||
operations are issued. Operations into five I/O classes prioritized in
|
||||
the following order:
|
||||
|
||||
+----------+-------------+-------------------------------------------+
|
||||
| Priority | I/O Class | Description |
|
||||
+==========+=============+===========================================+
|
||||
| highest | sync read | most reads |
|
||||
+----------+-------------+-------------------------------------------+
|
||||
| | sync write | as defined by application or via 'zfs' |
|
||||
| | | 'sync' property |
|
||||
+----------+-------------+-------------------------------------------+
|
||||
| | async read | prefetch reads |
|
||||
+----------+-------------+-------------------------------------------+
|
||||
| | async write | most writes |
|
||||
+----------+-------------+-------------------------------------------+
|
||||
| lowest | scrub read | scan read: includes both scrub and |
|
||||
| | | resilver |
|
||||
+----------+-------------+-------------------------------------------+
|
||||
|
||||
Each queue defines the minimum and maximum number of concurrent
|
||||
operations issued to the device. In addition, the device has an
|
||||
aggregate maximum, zfs_vdev_max_active. Note that the sum of the
|
||||
per-queue minimums must not exceed the aggregate maximum. If the sum of
|
||||
the per-queue maximums exceeds the aggregate maximum, then the number of
|
||||
active I/Os may reach zfs_vdev_max_active, in which case no further I/Os
|
||||
are issued regardless of whether all per-queue minimums have been met.
|
||||
|
||||
+-------------+--------------------------+--------------------------+
|
||||
| I/O Class | Min Active Parameter | Max Active Parameter |
|
||||
+=============+==========================+==========================+
|
||||
| sync read | zfs_v | zfs_v |
|
||||
| | dev_sync_read_min_active | dev_sync_read_max_active |
|
||||
+-------------+--------------------------+--------------------------+
|
||||
| sync write | zfs_vd | zfs_vd |
|
||||
| | ev_sync_write_min_active | ev_sync_write_max_active |
|
||||
+-------------+--------------------------+--------------------------+
|
||||
| async read | zfs_vd | zfs_vd |
|
||||
| | ev_async_read_min_active | ev_async_read_max_active |
|
||||
+-------------+--------------------------+--------------------------+
|
||||
| async write | zfs_vde | zfs_vde |
|
||||
| | v_async_write_min_active | v_async_write_max_active |
|
||||
+-------------+--------------------------+--------------------------+
|
||||
| scrub read | z | z |
|
||||
| | fs_vdev_scrub_min_active | fs_vdev_scrub_max_active |
|
||||
+-------------+--------------------------+--------------------------+
|
||||
|
||||
For many physical devices, throughput increases with the number of
|
||||
concurrent operations, but latency typically suffers. Further, physical
|
||||
devices typically have a limit at which more concurrent operations have
|
||||
no effect on throughput or can actually cause it to performance to
|
||||
decrease.
|
||||
|
||||
The ZIO scheduler selects the next operation to issue by first looking
|
||||
for an I/O class whose minimum has not been satisfied. Once all are
|
||||
satisfied and the aggregate maximum has not been hit, the scheduler
|
||||
looks for classes whose maximum has not been satisfied. Iteration
|
||||
through the I/O classes is done in the order specified above. No further
|
||||
operations are issued if the aggregate maximum number of concurrent
|
||||
operations has been hit or if there are no operations queued for an I/O
|
||||
class that has not hit its maximum. Every time an I/O is queued or an
|
||||
operation completes, the I/O scheduler looks for new operations to
|
||||
issue.
|
||||
|
||||
In general, smaller max_active's will lead to lower latency of
|
||||
synchronous operations. Larger max_active's may lead to higher overall
|
||||
throughput, depending on underlying storage and the I/O mix.
|
||||
|
||||
The ratio of the queues' max_actives determines the balance of
|
||||
performance between reads, writes, and scrubs. For example, when there
|
||||
is contention, increasing zfs_vdev_scrub_max_active will cause the scrub
|
||||
or resilver to complete more quickly, but reads and writes to have
|
||||
higher latency and lower throughput.
|
||||
|
||||
All I/O classes have a fixed maximum number of outstanding operations
|
||||
except for the async write class. Asynchronous writes represent the data
|
||||
that is committed to stable storage during the syncing stage for
|
||||
transaction groups (txgs). Transaction groups enter the syncing state
|
||||
periodically so the number of queued async writes quickly bursts up and
|
||||
then reduce down to zero. The zfs_txg_timeout tunable (default=5
|
||||
seconds) sets the target interval for txg sync. Thus a burst of async
|
||||
writes every 5 seconds is a normal ZFS I/O pattern.
|
||||
|
||||
Rather than servicing I/Os as quickly as possible, the ZIO scheduler
|
||||
changes the maximum number of active async write I/Os according to the
|
||||
amount of dirty data in the pool. Since both throughput and latency
|
||||
typically increase as the number of concurrent operations issued to
|
||||
physical devices, reducing the burstiness in the number of concurrent
|
||||
operations also stabilizes the response time of operations from other
|
||||
queues. This is particular important for the sync read and write queues,
|
||||
where the periodic async write bursts of the txg sync can lead to
|
||||
device-level contention. In broad strokes, the ZIO scheduler issues more
|
||||
concurrent operations from the async write queue as there's more dirty
|
||||
data in the pool.
|
||||
Reference in New Issue
Block a user