The advice to not cache data in ARC is simply wrong--PG does not work that way, shared buffers are more special purpose. PG *expects* the majority of caching to be done by the OS file cache and is designed around that assumption.
LZ4 compression changes things with regard to record size--even with fast NVMe--using a record size that will fit multiple compressed blocks increases performance by reducing the total data written despite the partial record writes. (I have benchmarked this extensively.)