Architecture

As illustrated below, JuiceFS consists of three major components: the JuiceFS client, the metadata engine, and the data storage. In the following sections, we will explore each of them in detail.

JuiceFS-arch

JuiceFS client

JuiceFS adopts a "rich client" design, making the JuiceFS client an important component of the system: all file I/O happens in the JuiceFS client, including background jobs like data compaction and trash file expiration. Thus, the JuiceFS client communicates with both the metadata engine and the data storage.

A variety of implementations and access methods are supported:

FUSE: A JuiceFS file system can be mounted on a host in a POSIX-compatible manner, allowing the massive cloud storage to be used as local storage. For details, see this document.
Python SDK: In scenarios where FUSE mounting is not feasible or where direct file system access from within a Python process is required, the Python SDK can read and write the file system directly. Furthermore, the Python SDK natively implements fsspec for easy integration with AI/ML frameworks like Ray. For details, see Python SDK.
Windows Client: Windows native client is available. See Windows (Beta) Usage Guide for more details.
Hadoop Java SDK: JuiceFS can replace HDFS, providing Hadoop with cost-effective and abundant storage capacity. For details, see Using JuiceFS in Hadoop.
Kubernetes CSI Driver: JuiceFS provides shared storage for containers in Kubernetes through its CSI Driver. For details, see Use JuiceFS in Kubernetes.
S3 Gateway: Applications using S3 as the storage layer can directly access the JuiceFS file system. Tools such as the AWS CLI, s3cmd, and MinIO client can be used to access the JuiceFS file system at the same time. For details, see JuiceFS S3 Gateway.
WebDAV Server: Files in JuiceFS can be operated directly using the HTTP protocol. For details, see Deploy WebDAV server.

Metadata engine

As shown in the architectural diagram above, JuiceFS Enterprise Edition and Cloud Service utilize a high-performance metadata engine, developed in-house by Juicedata. The metadata engine stores file metadata, which contains:

Common file system metadata: file name, size, permission information, creation and modification time, directory structure, file attribute, symbolic link, file lock, etc.
JuiceFS-specific metadata: file inode, chunk and slice mapping, client session, etc.

When using the JuiceFS Cloud Service, the metadata service is already deployed in most public cloud regions so that you can use it right out of the box. As a JuiceFS Cloud Service user, you will be using the metadata service via the public internet (in the same region for optimal performance). But if you are using JuiceFS on a larger scale and require even better access latency, contact the Juicedata team for private internet support via VPC peering.

The metadata engine uses the Raft algorithm to achieve consensus, where all metadata operations are appended as Raft logs. A Raft group typically consists of three nodes: one leader and two followers. All nodes maintain consistency through the Raft algorithm, ensuring strong consistency and high availability.

single-zone

A Raft group forms a JuiceFS metadata zone. Currently a single zone can handle around 200 million inodes. For a larger data scale, use our multi-zone solution (only available via on-premise deployment of JuiceFS Enterprise Edition). Multi-zone clusters scale horizontally by simply adding more zones.

multi-zone

A JuiceFS file system backed by a multi-zone metadata service can handle hundreds of billions of files, in which every zone assumes the same architecture as a single Raft group. Multiple zones can be deployed on a single node, and the number of zones can be dynamically adjusted manually or balanced automatically, which effectively avoids performance issues caused by data hot spots. All relevant functionalities are integrated into the JuiceFS Web Console for easy management and monitoring. For the JuiceFS client, accessing a multi-zone metadata service is just the same as a single-zone scenario, where it will be able to write to different zones at the same time. When cluster topology changes, the JuiceFS client will get notified and adapt automatically.

Data storage

Traditional file systems store both file data and metadata on local disks. JuiceFS, on the other hand, stores file data in object storage and the corresponding metadata in the metadata engine. As mentioned earlier, the JuiceFS client communicates with both the metadata engine and the data storage, while the metadata engine is completely decoupled from object storage. All actual file data resides in the object storage of your choice.

You can use object storage provided by public cloud services or self-hosted solutions. JuiceFS supports virtually all types of object storage, including public cloud options like Amazon S3, Google Cloud Storage (GCS), and Azure Blob, as well as self-hosted ones like OpenStack Swift, Ceph, and MinIO. For more details, see Set up Object Storage.

How JuiceFS stores files

When JuiceFS processes actual file data, three key concepts come into play: chunks, slices, and blocks.

Each file is composed of one or more "chunks." Each chunk has a maximum size of 64 MiB. Regardless of the file's size, all reads and writes are located based on their offsets (the position in the file where the read or write operation occurs) to the corresponding chunk. This design enables JuiceFS to achieve excellent performance even with large files. As long as the total length of the file remains unchanged, the chunk division of the file remains fixed, regardless of how many modifications or writes the file undergoes.

File and chunks

Chunks exist to optimize lookup and positioning, while the actual file writing is performed on "slices." In JuiceFS, each slice represents a single continuous write, belongs to a specific chunk, and cannot overlap between adjacent chunks. This ensures that the slice length never exceeds 64 MiB.

For example, if a file is generated through a continuous sequential write, each chunk contains only one slice. The figure above illustrates this scenario: a 160 MiB file is sequentially written, resulting in three chunks, each containing only one slice.

File writing generates slices, and invoking flush persists these slices. flush can be explicitly called by the user, and even if not invoked, the JuiceFS client automatically performs flush at the appropriate time to prevent buffer overflow (refer to Read/write buffer). When persisting to the object storage, slices are further split into individual "blocks" (default maximum size of 4 MiB) to enable multi-threaded concurrent writes, thereby enhancing write performance. The previously mentioned chunks and slices are logical data structures, while blocks represent the final physical storage form and serve as the smallest storage unit for the object storage and disk cache.

Split slices to blocks

After writing a file to JuiceFS, you cannot find the original file directly in the object storage. Instead, the storage bucket contains a chunks folder and a series of numbered directories and files. These numerically named object storage files are the blocks split and stored by JuiceFS. The mapping between these blocks, chunks, slices, and other metadata information (such as file names and sizes) is stored in the metadata engine. This decoupled design makes JuiceFS a high-performance file system.

How JuiceFS stores files

Regarding logical data structures, if a file is not generated through continuous sequential writes but through multiple append writes, each append write triggers a flush to initiate the upload, resulting in multiple slices. If the data size for each append write is less than 4 MiB, the data blocks eventually stored in the object storage are smaller than 4 MiB blocks.

Small append writes

Depending on the writing pattern, the arrangement of slices can be diverse:

If a file is repeatedly modified in the same part, it results in multiple overlapping slices.
If writes occur in non-overlapping parts, there will be gaps between slices.

However complex the arrangement of slices may be, when reading a file, the most recent written slice is read for each file position. The figure below illustrates this concept: while slices may overlap, reading the file always occurs "from top to bottom." This ensures that you see the latest state of the file.

Complicate pattern

Due to the potential overlapping of slices, JuiceFS marks the valid data offset range for each slice (read Community Edition docs for internal implementation). JuiceFS Community Edition and Enterprise Edition adopt the same design in the reference relationship between chunks and slices. This approach informs the file system of the valid data in each slice.

However, it is not difficult to imagine that looking up the "most recently written slice within the current read range" during file reading, especially with a large number of overlapping slices as shown in the figure, can significantly impact read performance. This leads to what we call "file fragmentation." File fragmentation not only affects read performance but also increases space usage at various levels (object storage and metadata). Hence, whenever a write occurs, the metadata service evaluates the file's fragmentation and schedules compaction as background jobs, merging all slices within the same chunk into one.

File fragmentation compaction

Some other technical aspects of JuiceFS storage design:

In order to avoid read amplification, files are not merged and stored, with one exception under intensive small-file scenarios: when a lot of small files are created, they'll be merged and stored in batch blocks to increase write performance. They will be split during later compaction operations.
JuiceFS guarantees strong consistency but can be tuned for performance in different scenarios. For instance, deliberately adjust metadata cache policies to trade consistency for performance. For more details, see Metadata Cache.
JuiceFS supports the trash functionality, and it is enabled by default. Deleted files are kept for a specified amount of time to help you avoid data loss caused by accidental deletion. For more details, see Trash.

Architecture

JuiceFS client​

Metadata engine​

Data storage​

How JuiceFS stores files​

JuiceFS client

Metadata engine

Data storage

How JuiceFS stores files