diff --git a/docs/Performance and Tuning/Workload Tuning.rst b/docs/Performance and Tuning/Workload Tuning.rst
new file mode 100644
index 0000000..0334a77
--- /dev/null
+++ b/docs/Performance and Tuning/Workload Tuning.rst	
@@ -0,0 +1,685 @@
+Below are tips for various workloads.
+
+.. _basic_concepts:
+
+Basic concepts
+--------------
+
+Descriptions of ZFS internals that have an effect on application
+performance follow.
+
+.. _adaptive_replacement_cache:
+
+Adaptive Replacement Cache
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For decades, operating systems have used RAM as a cache to avoid the
+necessity of waiting on disk IO, which is extremely slow. This concept
+is called page replacement. Until ZFS, virtually all filesystems used
+the Least Recently Used (LRU) page replacement algorithm in which the
+least recently used pages are the first to be replaced. Unfortunately,
+the LRU algorithm is vulnerable to cache flushes, where a brief change
+in workload that occurs occasionally removes all frequently used data
+from cache. The Adaptive Replacement Cache (ARC) algorithm was
+implemented in ZFS to replace LRU. It solves this problem by maintaining
+four lists:
+
+#. A list for recently cached entries.
+#. A list for recently cached entries that have been accessed more than
+   once.
+#. A list for entries evicted from #1.
+#. A list of entries evicited from #2.
+
+Data is evicted from the first list while an effort is made to keep data
+in the second list. In this way, ARC is able to outperform LRU by
+providing a superior hit rate.
+
+In addition, a dedicated cache device (typically a SSD) can be added to
+the pool, with
+``zpool add``\ *``poolname``*\ ``cache``\ *``devicename``*. The cache
+device is managed by the L2ARC, which scans entries that are next to be
+evicted and writes them to the cache device. The data stored in ARC and
+L2ARC can be controlled via the ``primarycache`` and ``secondarycache``
+zfs properties respectively, which can be set on both zvols and
+datasets. Possible settings are ``all``, ``none`` and ``metadata``. It
+is possible to improve performance when a zvol or dataset hosts an
+application that does its own caching by caching only metadata. One
+example is PostgreSQL. Another would be a virtual machine using ZFS.
+
+.. _alignment_shift_ashift:
+
+Alignment Shift (ashift)
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Top-level vdevs contain an internal property called ashift, which stands
+for alignment shift. It is set at vdev creation and it is immutable. It
+can be read using the ``zdb`` command. It is calculated as the maximum
+base 2 logarithm of the physical sector size of any child vdev and it
+alters the disk format such that writes are always done according to it.
+This makes 2^ashift the smallest possible IO on a vdev. Configuring
+ashift correctly is important because partial sector writes incur a
+penalty where the sector must be read into a buffer before it can be
+written. ZFS makes the implicit assumption that the sector size reported
+by drives is correct and calculates ashift based on that.
+
+In an ideal world, physical sector size is always reported correctly and
+therefore, this requires no attention. Unfortunately, this is not the
+case. The sector size on all storage devices was 512-bytes prior to the
+creation of flash-based solid state drives. Some operating systems, such
+as Windows XP, were written under this assumption and will not function
+when drives report a different sector size.
+
+Flash-based solid state drives came to market around 2007. These devices
+report 512-byte sectors, but the actual flash pages, which roughly
+correspond to sectors, are never 512-bytes. The early models used
+4096-byte pages while the newer models have moved to an 8192-byte page.
+In addition, "Advanced Format" hard drives have been created which also
+use a 4096-byte sector size. Partial page writes suffer from similar
+performance degradation as partial sector writes. In some cases, the
+design of NAND-flash makes the performance degradation even worse, but
+that is beyond the scope of this description.
+
+Reporting the correct sector sizes is the responsibility the block
+device layer. This unfortunately has made proper handling of devices
+that misreport drives different across different platforms. The
+respective methods are as follows:
+
+-  `sd.conf <http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks#ZFSandAdvancedFormatdisks-OverridingthePhysicalBlockSize>`__
+   on illumos
+-  `gnop(8) <https://www.freebsd.org/cgi/man.cgi?query=gnop&sektion=8&manpath=FreeBSD+10.2-RELEASE>`__
+   on FreeBSD; see for example `FreeBSD on 4K sector
+   drives <http://web.archive.org/web/20151022020605/http://ivoras.sharanet.org/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html>`__
+   (2011-01-01)
+-  `ashift= <https://openzfs.github.io/openzfs-docs/Project%20and%20Community/FAQ.html#advanced-format-disks-o>`__
+   on ZFS on Linux
+-  -o ashift= also works with both MacZFS (pool version 8) and ZFS-OSX
+   (pool version 5000).
+
+-o ashift= is convenient, but it is flawed in that the creation of pools
+containing top level vdevs that have multiple optimal sector sizes
+require the use of multiple commands. `A newer
+syntax <http://www.listbox.com/member/archive/182191/2013/07/search/YXNoaWZ0/sort/time_rev/page/2/entry/16:58/20130709002459:82E21654-E84F-11E2-A0FF-F6B47351D2F5/>`__
+that will rely on the actual sector sizes has been discussed as a cross
+platform replacement and will likely be implemented in the future.
+
+In addition, `Richard Yao <User:Ryao>`__ has contributed a `database of
+drives known to misreport sector
+sizes <https://github.com/openzfs/zfs/blob/master/cmd/zpool/os/linux/zpool_vdev_os.c#L98>`__
+to the ZFS on Linux project. It is used to automatically adjust ashift
+without the assistance of the system administrator. This approach is
+unable to fully compensate for misreported sector sizes whenever drive
+identifiers are used ambiguously (e.g. virtual machines, iSCSI LUNs,
+some rare SSDs), but it does a great amount of good. The format is
+roughly compatible with illumos' sd.conf and it is expected that other
+implementations will integrate the database in future releases. Strictly
+speaking, this database does not belong in ZFS, but the difficulty of
+patching the Linux kernel (especially older ones) necessitated that this
+be implemented in ZFS itself for Linux. The same is true for MacZFS.
+However, FreeBSD and illumos are both able to implement this in the
+correct layer.
+
+Compression
+~~~~~~~~~~~
+
+Internally, ZFS allocates data using multiples of the device's sector
+size, typically either 512 bytes or 4KB (see above). When compression is
+enabled, a smaller number of sectors can be allocated for each block.
+The uncompressed block size is set by the ``recordsize`` (defaults to
+128KB) or ``volblocksize`` (defaults to 8KB) property (for filesystems
+vs volumes).
+
+The following compression algorithms are available:
+
+-  LZ4
+
+   -  New algorithm added after feature flags were created. It is
+      significantly superior to LZJB in all metrics tested. It is new
+      default compression algorithm (compression=on) in
+      OpenZFS\ `1 <https://github.com/illumos/illumos-gate/commit/db1741f555ec79def5e9846e6bfd132248514ffe>`__.
+      It is available on all platforms have as of 2020.
+
+-  LZJB
+
+   -  Original default compression algorithm (compression=on) for ZFS.
+      It was created to satisfy the desire for a compression algorithm
+      suitable for use in filesystems. Specifically, that it provides
+      fair compression, has a high compression speed, has a high
+      decompression speed and detects incompressible data detection
+      quickly.
+
+-  GZIP (1 through 9)
+
+   -  Classic Lempel-Ziv implementation. It provides high compression,
+      but it often makes IO CPU-bound.
+
+-  ZLE (Zero Length Encoding)
+
+   -  A very simple algorithm that only compresses zeroes.
+
+If you want to use compression and are uncertain which to use, use LZ4.
+It averages a 2.1:1 compression ratio while gzip-1 averages 2.7:1, but
+gzip is much slower. Both figures are obtained from `testing by the LZ4
+project <https://code.google.com/p/lz4/>`__ on the Silesia corpus. The
+greater compression ratio of gzip is usually only worthwhile for rarely
+accessed data.
+
+.. _raid_z_stripe_width:
+
+RAID-Z stripe width
+~~~~~~~~~~~~~~~~~~~
+
+Choose a RAID-Z stripe width based on your IOPS needs and the amount of
+space you are willing to devote to parity information. If you need more
+IOPS, use fewer disks per stripe. If you need more usable space, use
+more disks per stripe. Trying to optimize your RAID-Z stripe width based
+on exact numbers is irrelevant in nearly all cases. See this `blog
+post <https://www.delphix.com/blog/delphix-engineering/zfs-raidz-stripe-width-or-how-i-learned-stop-worrying-and-love-raidz/>`__
+for more details.
+
+.. _dataset_recordsize:
+
+Dataset recordsize
+~~~~~~~~~~~~~~~~~~
+
+ZFS datasets use an internal recordsize of 128KB by default. The dataset
+recordsize is the basic unit of data used for internal copy-on-write on
+files. Partial record writes require that data be read from either ARC
+(cheap) or disk (expensive). recordsize can be set to any power of 2
+from 512 bytes to 128 kilobytes. Software that writes in fixed record
+sizes (e.g. databases) will benefit from the use of a matching
+recordsize.
+
+Changing the recordsize on a dataset will only take effect for new
+files. If you change the recordsize because your application should
+perform better with a different one, you will need to recreate its
+files. A cp followed by a mv on each file is sufficient. Alternatively,
+send/recv should recreate the files with the correct recordsize when a
+full receive is done.
+
+.. _larger_record_sizes:
+
+Larger record sizes
+^^^^^^^^^^^^^^^^^^^
+
+Record sizes of up to 16M are supported with the large_blocks pool
+feature, which is enabled by default on new pools on systems that
+support it. However, record sizes larger than 1M is disabled by default
+unless the zfs_max_recordsize kernel module parameter is set to allow
+sizes higher than 1M. Larger record sizes than 1M are not well tested as
+1M, although they should work. \`zfs send\` operations must specify -L
+to ensure that larger than 128KB blocks are sent and the receiving pools
+must support the large_blocks feature.
+
+.. _zvol_volblocksize:
+
+zvol volblocksize
+~~~~~~~~~~~~~~~~~
+
+Zvols have a volblocksize property that is analogous to record size. The
+default size is 8KB, which is the size of a page on the SPARC
+architecture. Workloads that use smaller sized IOs (such as swap on x86
+which use 4096-byte pages) will benefit from a smaller volblocksize.
+
+Deduplication
+~~~~~~~~~~~~~
+
+Deduplication uses an on-disk hash table, using `extensible
+hashing <http://en.wikipedia.org/wiki/Extensible_hashing>`__ as
+implemented in the ZAP (ZFS Attribute Processor). Each cached entry uses
+slightly more than 320 bytes of memory. The DDT code relies on ARC for
+caching the DDT entries, such that there is no double caching or
+internal fragmentation from the kernel memory allocator. Each pool has a
+global deduplication table shared across all datasets and zvols on which
+deduplication is enabled. Each entry in the hash table is a record of a
+unique block in the pool. (Where the block size is set by the
+``recordsize`` or ``volblocksize`` properties.)
+
+The hash table (also known as the DDT or DeDup Table) must be accessed
+for every dedup-able block that is written or freed (regardless of
+whether it has multiple references). If there is insufficient memory for
+the DDT to be cached in memory, each cache miss will require reading a
+random block from disk, resulting in poor performance. For example, if
+operating on a single 7200RPM drive that can do 100 io/s, uncached DDT
+reads would limit overall write throughput to 100 blocks per second, or
+400KB/s with 4KB blocks.
+
+The consequence is that sufficient memory to store deduplication data is
+required for good performance. The deduplication data is considered
+metadata and therefore can be cached if the ``primarycache`` or
+``secondarycache`` properties are set to ``metadata``. In addition, the
+deduplication table will compete with other metadata for metadata
+storage, which can have a negative effect on performance. Simulation of
+the number of deduplication table entries needed for a given pool can be
+done using the -D option to zdb. Then a simple multiplication by
+320-bytes can be done to get the approximate memory requirements.
+Alternatively, you can estimate an upper bound on the number of unique
+blocks by dividing the amount of storage you plan to use on each dataset
+(taking into account that partial records each count as a full
+recordsize for the purposes of deduplication) by the recordsize and each
+zvol by the volblocksize, summing and then multiplying by 320-bytes.
+
+.. _metaslab_allocator:
+
+Metaslab Allocator
+~~~~~~~~~~~~~~~~~~
+
+ZFS top level vdevs are divided into metaslabs from which blocks can be
+independently allocated so allow for concurrent IOs to perform
+allocations without blocking one another. At present, there is a
+regression\ `2 <https://github.com/zfsonlinux/zfs/pull/3643>`__ on the
+Linux and Mac OS X ports that causes serialization to occur.
+
+By default, the selection of a metaslab is biased toward lower LBAs to
+improve performance of spinning disks, but this does not make sense on
+solid state media. This behavior can be adjusted globally by setting the
+ZFS module's global metaslab_lba_weighting_enabled tuanble to 0. This
+tunable is only advisable on systems that only use solid state media for
+pools.
+
+The metaslab allocator will allocate blocks on a first-fit basis when a
+metaslab has more than or equal to 4 percent free space and a best-fit
+basis when a metaslab has less than 4 percent free space. The former is
+much faster than the latter, but it is not possible to tell when this
+behavior occurs from the pool's free space. However, the command \`zdb
+-mmm $POOLNAME\` will provide this information.
+
+.. _pool_geometry:
+
+Pool Geometry
+~~~~~~~~~~~~~
+
+If small random IOPS are of primary importance, mirrored vdevs will
+outperform raidz vdevs. Read IOPS on mirrors will scale with the number
+of drives in each mirror while raidz vdevs will each be limited to the
+IOPS of the slowest drive.
+
+If sequential writes are of primary importance, raidz will outperform
+mirrored vdevs. Sequential write throughput increases linearly with the
+number of data disks in raidz while writes are limited to the slowest
+drive in mirrored vdevs. Sequential read performance should be roughly
+the same on each.
+
+Both IOPS and throughput will increase by the respective sums of the
+IOPS and throughput of each top level vdev, regardless of whether they
+are raidz or mirrors.
+
+.. _whole_disks_versus_partitions:
+
+Whole Disks versus Partitions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+ZFS will behave differently on different platforms when given a whole
+disk.
+
+On illumos, ZFS attempts to enable the write cache on a whole disk. The
+illumos UFS driver cannot ensure integrity with the write cache enabled,
+so by default Sun/Solaris systems using UFS file system for boot were
+shipped with drive write cache disabled (long ago, when Sun was still an
+independent company). For safety on illumos, if ZFS is not given the
+whole disk, it could be shared with UFS and thus it is not appropriate
+for ZFS to enable write cache. In this case, the write cache setting is
+not changed and will remain as-is. Today, most vendors ship drives with
+write cache enabled by default.
+
+On Linux, the Linux IO elevator is largely redundant given that ZFS has
+its own IO elevator, so ZFS will set the IO elevator to noop to avoid
+unnecessary CPU overhead.
+
+ZFS will also create a GPT partition table own partitions when given a
+whole disk under illumos on x86/amd64 and on Linux. This is mainly to
+make booting through UEFI possible because UEFI requires a small FAT
+partition to be able to boot the system. The ZFS driver will be able to
+tell the difference between whether the pool had been given the entire
+disk or not via the whole_disk field in the label.
+
+This is not done on FreeBSD. Pools created by FreeBSD will always have
+the whole_disk field set to true, such that a pool imported on another
+platform that was created on FreeBSD will always be treated as the whole
+disks were given to ZFS.
+
+.. _general_recommendations:
+
+General recommendations
+-----------------------
+
+.. _alignment_shift:
+
+Alignment shift
+~~~~~~~~~~~~~~~
+
+Make sure that you create your pools such that the vdevs have the
+correct alignment shift for your storage device's size. if dealing with
+flash media, this is going to be either 12 (4K sectors) or 13 (8K
+sectors). For SSD ephemeral storage on Amazon EC2, the proper setting is
+12.
+
+.. _atime_updates:
+
+Atime Updates
+~~~~~~~~~~~~~
+
+Set either relatime=on or atime=off to minimize IOs used to update
+access time stamps. For backward compatibility with a small percentage
+of software that supports it, relatime is preferred when available and
+should be set on your entire pool. atime=off should be used more
+selectively.
+
+.. _free_space:
+
+Free Space
+~~~~~~~~~~
+
+Keep pool free space above 10% to avoid many metaslabs from reaching the
+4% free space threshold to switch from first-fit to best-fit allocation
+strategies. When the threshold is hit, the `metaslab
+allocator <Performance_tuning#Metaslab_Allocator>`__ becomes very CPU
+intensive in an attempt to protect itself from fragmentation. This
+reduces IOPS, especially as more metaslabs reach the 4% threshold.
+
+The recommendation is 10% rather than 5% because metaslabs selection
+considers both location and free space unless the global
+metaslab_lba_weighting_enabled tunable is set to 0. When that tunable is
+0, ZFS will consider only free space, so the the expense of the best-fit
+allocator can be avoided by keeping free space above 5%. That setting
+should only be used on systems with pools that consist of solid state
+drives because it will reduce sequential IO performance on mechanical
+disks.
+
+.. _lz4_compression:
+
+LZ4 compression
+~~~~~~~~~~~~~~~
+
+Set compression=lz4 on your pools' root datasets so that all datasets
+inherit it unless you have a reason not to enable it. Userland tests of
+LZ4 compression of incompressible data in a single thread has shown that
+it can process 10GB/sec, so it is unlikely to be a bottleneck even on
+incompressible data. Furthermore, incompressible data will be stored
+without compression such that reads of incompressible data with
+compression enabled will not be subject to decompression. Writes are so
+fast that in-compressible data is unlikely to see a performance penalty
+from the use of LZ4 compression. The reduction in IO from LZ4 will
+typically be a performance win.
+
+Note that larger record sizes will increase compression ratios on
+compressible data by allowing compression algorithms to process more
+data at a time.
+
+.. _nvme_low_level_formatting:
+
+NVMe low level formatting
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+See
+`Hardware#NVMe_low_level_formatting <Hardware#NVMe_low_level_formatting>`__.
+
+.. _pool_geometry_1:
+
+Pool Geometry
+~~~~~~~~~~~~~
+
+Do not put more than ~16 disks in raidz. The rebuild times on mechanical
+disks will be excessive when the pool is full.
+
+.. _synchronous_io:
+
+Synchronous I/O
+~~~~~~~~~~~~~~~
+
+If your workload involves fsync or O_SYNC and your pool is backed by
+mechanical storage, consider adding one or more SLOG devices. Pools that
+have multiple SLOG devices will distribute ZIL operations across them.
+The best choice for SLOG device(s) are likely Optane / 3D XPoint SSDs.
+See
+`Hardware#Optane_.2F_3D_XPoint_SSDs <Hardware#Optane_.2F_3D_XPoint_SSDs>`__
+for a description of them. If an Optane / 3D XPoint SSD is an option,
+the rest of this section on synchronous I/O need not be read. If Optane
+/ 3D XPoint SSDs is not an option, see
+`Hardware#NAND_Flash_SSDs <Hardware#NAND_Flash_SSDs>`__ for suggestions
+for NAND flash SSDs and also read the information below.
+
+To ensure maximum ZIL performance on NAND flash SSD-based SLOG devices,
+you should also overprovison spare area to increase
+IOPS\ `3 <http://www.anandtech.com/show/6489/playing-with-op>`__. Only
+about 4GB is needed, so the rest can be left as overprovisioned storage.
+The choice of 4GB is somewhat arbitrary. Most systems do not write
+anything close to 4GB to ZIL between transaction group commits, so
+overprovisioning all storage beyond the 4GB partition should be alright.
+If a workload needs more, then make it no more than the maximum ARC
+size. Even under extreme workloads, ZFS will not benefit from more SLOG
+storage than the maximum ARC size. That is half of system memory on
+Linux and 3/4 of system memory on illumos.
+
+.. _overprovisioning_by_secure_erase_and_partition_table_trick:
+
+Overprovisioning by secure erase and partition table trick
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+You can do this with a mix of a secure erase and a partition table
+trick, such as the following:
+
+#. Run a secure erase on the NAND-flash SSD.
+#. Create a partition table on the NAND-flash SSD.
+#. Create a 4GB partition.
+#. Give the partition to ZFS to use as a log device.
+
+If using the secure erase and partition table trick, do *not* use the
+unpartitioned space for other things, even temporarily. That will reduce
+or eliminate the overprovisioning by marking pages as dirty.
+
+Alternatively, some devices allow you to change the sizes that they
+report.This would also work, although a secure erase should be done
+prior to changing the reported size to ensure that the SSD recognizes
+the additional spare area. Changing the reported size can be done on
+drives that support it with \`hdparm -N \` on systems that have
+laptop-mode-tools.
+
+.. _nvme_overprovisioning:
+
+NVMe overprovisioning
+^^^^^^^^^^^^^^^^^^^^^
+
+On NVMe, you can use namespaces to achieve overprovisioning:
+
+#. Do a sanitize command as a precaution to ensure the device is
+   completely clean.
+#. Delete the default namespace.
+#. Create a new namespace of size 4GB.
+#. Give the namespace to ZFS to use as a log device. e.g. zfs add tank
+   log /dev/nvme1n1
+
+.. _whole_disks:
+
+Whole disks
+~~~~~~~~~~~
+
+Whole disks should be given to ZFS rather than partitions. If you must
+use a partition, make certain that the partition is properly aligned to
+avoid read-modify-write overhead. See the section on `Alignment
+Shift <Performance_tuning#Alignment_Shift_.28ashift.29>`__ for a
+description of proper alignment. Also, see the section on `Whole Disks
+versus Partitions <Performance_tuning#Whole_Disks_versus_Partitions>`__
+for a description of changes in ZFS behavior when operating on a
+partition.
+
+Single disk RAID 0 arrays from RAID controllers are not equivalent to
+whole disks. The `Hardware <Hardware#Hardware_RAID_controllers>`__ page
+explains in detail.
+
+.. _bit_torrent:
+
+Bit Torrent
+-----------
+
+Bit torrent performs 16KB random reads/writes. The 16KB writes cause
+read-modify-write overhead. The read-modify-write overhead can reduce
+performance by a factor of 16 with 128KB record sizes when the amount of
+data written exceeds system memory. This can be avoided by using a
+dedicated dataset for bit torrent downloads with recordsize=16KB.
+
+When the files are read sequentially through a HTTP server, the random
+nature in which the files were generated creates fragmentation that has
+been observed to reduce sequential read performance by a factor of two
+on 7200RPM hard disks. If performance is a problem, fragmentation can be
+eliminated by rewriting the files sequentially in either of two ways:
+
+The first method is to configure your client to download the files to a
+temporary directory and then copy them into their final location when
+the downloads are finished, provided that your client supports this.
+
+The second method is to use send/recv to recreate a dataset
+sequentially.
+
+In practice, defragmenting files obtained through bit torrent should
+only improve performance when the files are stored on magnetic storage
+and are subject to significant sequential read workloads after creation.
+
+.. _database_workloads:
+
+Database workloads
+------------------
+
+Setting redundant_metadata=mostly can increase IOPS by at least a few
+percentage points by eliminating redundant metadata at the lowest level
+of the indirect block tree. This comes with the caveat that data loss
+will occur if a metadata block pointing to data blocks is corrupted and
+there are no duplicate copies, but this is generally not a problem in
+production on mirrored or raidz vdevs.
+
+MySQL
+~~~~~
+
+InnoDB
+^^^^^^
+
+Make separate datasets for InnoDB's data files and log files. Set
+recordsize=16K on InnoDB's data files to avoid expensive partial record
+writes and leave recordsize=128K on the log files. Set
+primarycache=metadata on both to prefer InnoDB's
+caching.\ `4 <https://www.patpro.net/blog/index.php/2014/03/09/2617-mysql-on-zfs-on-freebsd/>`__
+Set logbias=throughput on the data to stop ZIL from writing twice.
+
+Set skip-innodb_doublewrite in my.cnf to prevent innodb from writing
+twice. The double writes are a data integrity feature meant to protect
+against corruption from partially-written records, but those are not
+possible on ZFS. It should be noted that Percona’s
+blog\ `5 <https://www.percona.com/blog/2014/05/23/improve-innodb-performance-write-bound-loads/>`__
+had advocated using an ext4 configuration where double writes were
+turned off for a performance gain, but later recanted it because it
+caused data corruption. Following a well timed power failure, an in
+place filesystem such as ext4 can have half of a 8KB record be old while
+the other half would be new. This would be the corruption that caused
+Percona to recant its advice. However, ZFS’ copy on write design would
+cause it to return the old correct data following a power failure (no
+matter what the timing is). That prevents the corruption that the double
+write feature is intended to prevent from ever happening. The double
+write feature is therefore unnecessary on ZFS and can be safely turned
+off for better performance.
+
+On Linux, the driver's AIO implementation is a compatibility shim that
+just barely passes the POSIX standard. InnoDB performance suffers when
+using its default AIO codepath. Set innodb_use_native_aio=0 and
+innodb_use_atomic_writes=0 in my.cnf to disable AIO. Both of these
+settings must be disabled to disable AIO.
+
+PostgreSQL
+~~~~~~~~~~
+
+Make separate datasets for PostgreSQL's data and WAL. Set recordsize=8K
+on both to avoid expensive partial record writes. Set logbias=throughput
+on PostgreSQL's data to avoid writing twice.
+
+SQLite
+~~~~~~
+
+Make a separate dataset for the database. Set the recordsize to 64K. Set
+the SQLite page size to 65536
+bytes\ `6 <https://www.sqlite.org/pragma.html#pragma_page_size>`__.
+
+Note that SQLite databases typically are not exercised enough to merit
+special tuning, but this will provide it. Note the side effect on cache
+size mentioned at
+SQLite.org\ `7 <https://www.sqlite.org/pgszchng2016.html>`__.
+
+.. _file_servers:
+
+File servers
+------------
+
+Create a dedicated dataset for files being served.
+
+See
+`Performance_tuning#Sequential_workloads <Performance_tuning#Sequential_workloads>`__
+for configuration recommendations.
+
+.. _sequential_workloads:
+
+Sequential workloads
+--------------------
+
+Set recordsize=1M on datasets that are subject to sequential workloads.
+Read
+`Performance_tuning#Larger_record_sizes <Performance_tuning#Larger_record_sizes>`__
+for documentation on things that should be known before setting 1M
+record sizes.
+
+Set compression=lz4 as per the general recommendation for `LZ4
+compression <Performance_tuning#LZ4_compression>`__.
+
+.. _video_games_directories:
+
+Video games directories
+-----------------------
+
+Create a dedicated dataset, use chown to make it user accessible (or
+create a directory under it and use chown on that) and then configure
+the game download application to place games there. Specific information
+on how to configure various ones is below.
+
+See
+`Performance_tuning#Sequential_workloads <Performance_tuning#Sequential_workloads>`__
+for configuration recommendations before installing games.
+
+Note that the performance gains from this tuning are likely to be small
+and limited to load times. However, the combination of 1M records and
+LZ4 will allow more games to be stored, which is why this tuning is
+documented despite the performance gains being limited. A steam library
+of 300 games (mostly from humble bundle) that had these tweaks applied
+to it saw 20% space savings. Both faster load times and significant
+space savings are possible on compressible games when this tuning has
+been done. Games whose assets are already compressed will see little to
+no benefit.
+
+Lutris
+~~~~~~
+
+Open the context menu by left clicking on the triple bar icon in the
+upper right. Go to "Preferences" and then the "System options" tab.
+Change the default installation directory and click save.
+
+Steam
+~~~~~
+
+Go to "Settings" -> "Downloads" -> "Steam Library Folders" and use "Add
+Library Folder" to set the directory for steam to use to store games.
+Make sure to set it to the default by right clicking on it and clicking
+"Make Default Folder" before closing the dialogue.
+
+.. _virtual_machines:
+
+Virtual machines
+----------------
+
+Virtual machine images on ZFS should be stored using either zvols or raw
+files to avoid unnecessary overhead. The recordsize/volblocksize and
+guest filesystem should be configured to match to avoid overhead from
+partial record modification. This would typically be 4K. If raw files
+are used, a separate dataset should be used to make it easy to configure
+recordsize independently of other things stored on ZFS.
+
+.. _qemu_kvm_xen:
+
+QEMU / KVM / Xen
+~~~~~~~~~~~~~~~~
+
+AIO should be used to maximize IOPS when using files for guest storage.