Seminaire Whisper
BetrFS: Write-Optimization in a Kernel File System
Friday, December 15, 2017Don Porter (Université de Caroline du Nord/Chapel Hill)
Applications exhibit a mixture of I/O patterns, ranging from large, sequential reads to small, random writes, yet general-purpose file system designs trade good performance on some I/O patterns for poor performance on others. For instance, ext4 is designed to update data in place. Ext4 can issue sequential reads and writes at disk bandwidth, while only realizing a small fraction of disk bandwidth for random writes, as are commonly exhibited by applications such as SQLite, or common IMAP email servers.
This talk describes BetrFS, an in-kernel file system for Linux that offers good performance on all operations. First, BetrFS uses a data structure called a B^e-tree to index on-disk data. A B^e-tree eliminates the trade-off between small, random writes and large, sequential scans. BetrFS also introduces techniques at the OS and data structure level to smoothly navigate other tensions, such as balancing large directory rename performance against maintaining on-disk locality for efficient directory searches.
Compared to commodity file systems, BetrFS can improve workload performance by up to two orders of magnitude, and generally matches other file systems in the worst cases. For example, BetrFS improves performance of the Dovecot IMAP server by up to 41% over update-in-place file systems, such as ext4 or btrfs, and can improve rsync performance by up to 31.5x.
More information about BetrFS, including source code, is available at betrfs.org.
Gilles.Muller (at) nulllip6.fr