<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>JuiceFS Blog</title><link>https://www.juicefs.com/en/blog/</link><description>Latest news from JuiceFS</description><atom:link href="http://juicefs.com/en/blog/latest/feed/" rel="self"/><language>en</language><lastBuildDate>Wed, 15 Apr 2026 07:22:00 +0000</lastBuildDate><item><title>JuiceFS Performance Optimization for AI Scenarios</title><link>https://www.juicefs.com/en/blog/engineering/juicefs-ai-workload-performance-optimization</link><description>&lt;div class="block-markdown"&gt;&lt;p&gt;The scale of computing power for &lt;a href="https://en.wikipedia.org/wiki/Large_language_model"&gt;large language model&lt;/a&gt; (LLM) training continues to expand. While GPU performance keeps improving, data access bottlenecks are becoming increasingly prominent in overall system performance. Local storage offers excellent performance but has limited scalability. Object storage excels in cost and scalability but suffers from insufficient throughput in massive small‑file and high‑concurrency scenarios. Teams often struggle to choose between them.  &lt;/p&gt;
&lt;p&gt;Therefore, &lt;a href="https://en.wikipedia.org/wiki/Comparison_of_distributed_file_systems"&gt;distributed file systems&lt;/a&gt; have become a key solution to balance high performance and scalability. &lt;a href="https://juicefs.com/docs/community/introduction/"&gt;JuiceFS&lt;/a&gt; has been widely deployed in AI scenarios across multiple industries. Its distributed architecture delivers high performance, strong scalability, and low cost simultaneously for large‑scale data access.  &lt;/p&gt;
&lt;p&gt;In this article, we’ll introduce JuiceFS’ architecture from a performance perspective and analyze core performance bottlenecks and optimization methods under different access patterns. We’ll also offer links of key points for references, helping you understand JuiceFS’ performance mechanisms and master common tuning strategies.&lt;/p&gt;
&lt;h3&gt;Performance foundations from the JuiceFS architecture&lt;/h3&gt;
&lt;p&gt;JuiceFS comes in &lt;a href="https://juicefs.com/docs/community/introduction/"&gt;Community Edition&lt;/a&gt; and &lt;a href="https://juicefs.com/docs/cloud/"&gt;Enterprise Edition&lt;/a&gt;. Both share the same architecture: metadata and data are separated. The client adopts a rich‑client design, handling core logic including some metadata operations, and provides both metadata and data caching. These modules work together for efficient data location and access. The underlying data is stored in object storage, with local caches further improving access performance. For external interfaces, JuiceFS supports multiple access methods – FUSE is the most common, and it also provides various SDKs and an S3 gateway.  &lt;/p&gt;
&lt;p&gt;JuiceFS Community Edition is designed as a general‑purpose file system. Users can choose different metadata engines based on their needs. For small‑scale deployments, Redis delivers lightweight, low‑latency metadata management. For large‑scale file scenarios, &lt;a href="https://tikv.org/"&gt;TiKV&lt;/a&gt; provides good horizontal scalability.  &lt;/p&gt;
&lt;p&gt;JuiceFS Enterprise Edition targets complex, high‑performance scenarios. It differs from Community Edition in two ways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It uses a self‑developed multi‑zone metadata engine built on Raft that runs as an in‑memory cluster, offering low latency and strong horizontal scalability. It supports up to 500 billion files. Operations that require multiple key-value requests in the Community Edition often need only one or two in the Enterprise Edition, and complex logic can be processed inside the metadata cluster.  &lt;/li&gt;
&lt;li&gt;The Enterprise Edition supports distributed cache sharing: clients in the same group can access each other’s local caches via consistent hashing. This improves cache hit rates and access efficiency. In multi‑node, high‑concurrency scenarios, the cache space scales horizontally, and most required data can be warmed up before job execution. This accelerates AI training and inference while boosting performance and stability. See &lt;a href="https://juicefs.com/en/blog/release-notes/juicefs-enterprise-5-3-rdma-support"&gt;JuiceFS Enterprise 5.3: 500B+ Files per File System &amp;amp; RDMA Support&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;JuiceFS 社区版和企业版架构&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;JuiceFS Community Edition and Enterprise Edition architectures&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;h4&gt;Data chunking&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://juicefs.com/docs/community/internals/io_processing"&gt;JuiceFS splits data into chunks&lt;/a&gt; and stores them in object storage. This design is key to its performance, affecting data read efficiency, cache hit rate, and throughput under high concurrency.  &lt;/p&gt;
&lt;p&gt;JuiceFS breaks a file into multiple chunks. Inside each chunk, the system maintains a management structure called a slice to track writes and updates. When data is written, new data does not overwrite existing slices; instead, a new slice is appended on top of the chunk.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;chunk&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;How JuiceFS stores data&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;Ideally, each chunk ends up containing only one slice. Each slice consists of several 4 MB blocks, which are the smallest unit stored in object storage. By default, the caching system also manages data at the block level.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;block&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;How JuiceFS stores data&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;As shown in the diagram on the upper right, file updates use an append‑only write pattern: existing slices are shown in red, and new data is appended as a new slice. During reads, the system combines the slices to form the current view. When fragmentation becomes excessive, a compaction process merges slices to optimize access performance. For more details on data chunking, refer to &lt;a href="https://juicefs.com/en/blog/engineering/design-metadata-data-storage"&gt;Code-Level Analysis: Design Principles of JuiceFS Metadata and Data Storage&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;Caching&lt;/h4&gt;
&lt;p&gt;Compared to direct object storage access, JuiceFS performance improvements largely benefit from its caching mechanism. The JuiceFS client comes with a high‑performance local cache module. Key configuration options include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;cache-dir&lt;/code&gt;: specifies the cache directory.  &lt;/li&gt;
&lt;li&gt;&lt;code&gt;cache-size&lt;/code&gt;: sets the maximum cache space.  &lt;/li&gt;
&lt;li&gt;Prefetch: a parameter in the cache module that controls prefetching. When a request hits a block, a background thread fetches the entire block.  &lt;/li&gt;
&lt;li&gt;Write‑back related settings: improves write IOPS by writing data blocks that need to be uploaded to object storage into the local cache first, then asynchronously uploading them to object storage.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;JuiceFS Enterprise Edition also provides advanced configurations. For example, a &lt;a href="https://juicefs.com/docs/cloud/guide/cache/"&gt;cache group&lt;/a&gt; can be used to designate a set of clients whose local caches form a distributed cache group, enabling cache sharing. In addition, the no sharing option (related to cache groups) allows a client to read data only from a specified cache group without serving its own cache to others. This creates a two‑level cache:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The first level is the local cache.  &lt;/li&gt;
&lt;li&gt;The second level is the cache on other nodes in the group.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Another performance‑boosting mechanism is the memory buffer (read buffer), which provides:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I/O request merging: multiple consecutive I/O requests can be merged in memory. For example, three I/O requests issued by the system may be reduced to just one after being processed by the memory buffer.  &lt;/li&gt;
&lt;li&gt;Adaptive read‑ahead: in large‑file sequential read scenarios, adaptive read‑ahead increases request concurrency by prefetching data. This fully utilizes cache and object storage resources and improves overall I/O performance.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Enterprise Edition also offers advanced read‑ahead settings:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;max read ahead&lt;/code&gt;: sets the maximum read‑ahead range.  &lt;/li&gt;
&lt;li&gt;&lt;code&gt;initial read ahead&lt;/code&gt;: sets the initial read‑ahead window size (default unit is 4 MB blocks).  &lt;/li&gt;
&lt;li&gt;&lt;code&gt;read ahead ratio&lt;/code&gt;: a configuration added last year that controls the read‑ahead ratio for large‑file random reads, reducing bandwidth waste caused by read amplification. Overly aggressive read‑ahead can negatively impact random read performance; read ahead ratio helps mitigate this. In AI scenarios, when large‑file sequential or random reads cause bandwidth or IOPS bottlenecks, adjusting these parameters can optimize overall performance.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;JuiceFS benchmark I/O tests and bottleneck analysis&lt;/h2&gt;
&lt;p&gt;Before diving into performance tuning for common &lt;a href="https://en.wikipedia.org/wiki/Artificial_intelligence"&gt;AI&lt;/a&gt; scenarios, let’s first examine JuiceFS’ I/O behavior under ideal conditions through sequential and random read benchmarks. This helps us understand throughput and latency under different access patterns, providing a reference for the read/write patterns of subsequent AI/ML workloads.&lt;/p&gt;
&lt;h3&gt;Sequential read performance&lt;/h3&gt;
&lt;p&gt;In JuiceFS, sequential read performance is typically bandwidth‑bound. In cold read scenarios, performance is mainly limited by object storage bandwidth; in distributed cache scenarios, network bandwidth can become the bottleneck. For example, a node with a 40 Gbps NIC may achieve less than 5 Gbps usable bandwidth. In addition, the user‑kernel transition overhead in the FUSE layer limits single‑thread throughput. Tests showed single‑thread sequential read bandwidth around 3.5 Gbps. To break this limit, multi‑threaded or higher‑concurrency strategies are needed to fully utilize storage and network resources.  &lt;/p&gt;
&lt;p&gt;The table below shows test results of JuiceFS sequential read performance:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style="text-align: left;"&gt;Threads&lt;/th&gt;
&lt;th style="text-align: left;"&gt;Bandwidth (GB/s)&lt;/th&gt;
&lt;th style="text-align: left;"&gt;Bandwidth per thread (GB/s)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;1&lt;/td&gt;
&lt;td style="text-align: left;"&gt;3.5&lt;/td&gt;
&lt;td style="text-align: left;"&gt;3.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;2&lt;/td&gt;
&lt;td style="text-align: left;"&gt;6.3&lt;/td&gt;
&lt;td style="text-align: left;"&gt;3.15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;3&lt;/td&gt;
&lt;td style="text-align: left;"&gt;9.5&lt;/td&gt;
&lt;td style="text-align: left;"&gt;3.16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;4&lt;/td&gt;
&lt;td style="text-align: left;"&gt;9.7&lt;/td&gt;
&lt;td style="text-align: left;"&gt;2.43&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;6&lt;/td&gt;
&lt;td style="text-align: left;"&gt;14.0&lt;/td&gt;
&lt;td style="text-align: left;"&gt;2.33&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;8&lt;/td&gt;
&lt;td style="text-align: left;"&gt;17.0&lt;/td&gt;
&lt;td style="text-align: left;"&gt;2.13&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;10&lt;/td&gt;
&lt;td style="text-align: left;"&gt;18.6&lt;/td&gt;
&lt;td style="text-align: left;"&gt;1.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;15&lt;/td&gt;
&lt;td style="text-align: left;"&gt;21&lt;/td&gt;
&lt;td style="text-align: left;"&gt;1.4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In the performance test, single‑thread sequential read bandwidth was about 3.5 Gbps. As the number of threads increased, total throughput gradually approached the network bandwidth limit. To help users evaluate the performance ceiling of their own environment, JuiceFS provides the &lt;code&gt;bj bench&lt;/code&gt; subcommand for testing object storage bandwidth.  &lt;/p&gt;
&lt;p&gt;In real workloads, caching is more common than direct object storage access. In such cases, increasing the buffer size raises the number of background prefetch requests, thereby improving concurrency and overall throughput. For example, after increasing the buffer size to 400 MB (corresponding to 100 background prefetch requests of 4 MB each), concurrency improved significantly and overall throughput increased.&lt;/p&gt;
&lt;h3&gt;Random read performance&lt;/h3&gt;
&lt;h4&gt;Low‑concurrency random reads&lt;/h4&gt;
&lt;p&gt;In low‑concurrency, non‑asynchronous access scenarios, each request must wait for the previous one to complete before being issued. As a result, latency has a significant impact on overall performance. I/O latency can come from many sources, including metadata query latency, object storage access latency, and local or distributed cache read latency. When analyzing random read performance, we must closely examine these latency factors.  &lt;/p&gt;
&lt;p&gt;In a 4 KB cold random read scenario, if the IOPS is only 8 and object storage latency is about 125 ms, the concurrency level is roughly 1 (8 IOPS × 125 ms ≈ 1,000 ms).  &lt;/p&gt;
&lt;p&gt;This indicates a near‑single‑concurrent, serial‑blocked state. In such cases, the optimization focus should be on shortening the access path and reducing per‑request latency rather than increasing concurrency – for example, by warming up data into the local cache. After data warm-up, the random read path switches from object storage to local cache, and IOPS can increase to about 12,000, approaching the I/O level of a local disk.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;juicefs stats 命令查看性能&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;Using the juicefs stats command to view performance&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;预热后性能&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;Performance after data warm-up&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;h4&gt;High‑concurrency random reads&lt;/h4&gt;
&lt;p&gt;High‑concurrency random reads typically occur in scenarios with high thread counts or asynchronous I/O. The main performance bottleneck is often IOPS limits – including metadata IOPS, object storage IOPS, and cache IOPS. JuiceFS allows you to observe these metrics and pinpoint the bottleneck. Client machine resources (CPU, memory) can also affect performance, but such bottlenecks are easy to monitor.  &lt;/p&gt;
&lt;p&gt;In a cold read scenario using &lt;a href="https://github.com/anlongfei/libaio"&gt;Libaio&lt;/a&gt; for random reads, the object‑side IOPS ceiling is around 7,000/s. When caching is enabled and data is warmed up, the access path shifts from object storage to the cache layer, and IOPS can further increase to over 20,000. This shows that the bottleneck for high‑concurrency random reads shifts as the access path changes.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;预热前&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;Before data warm-up&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;预热后&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;After data warm-up&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;For a deeper dive into JuiceFS’ complete data access path, refer to &lt;a href="https://juicefs.com/en/blog/engineering/optimize-read-performance"&gt;Optimizing JuiceFS Read Performance: Readahead, Prefetch, and Cache&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;I/O characteristics and performance tuning for common AI scenarios&lt;/h2&gt;
&lt;h3&gt;Large‑file sequential reads&lt;/h3&gt;
&lt;p&gt;A typical large‑file sequential read scenario is model loading, such as loading PyTorch .pt files saved via pickle serialization. In this process, performance is limited by two factors:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.python.org/3/library/pickle.html"&gt;Pickle&lt;/a&gt; deserialization efficiency determines data processing speed.  &lt;/li&gt;
&lt;li&gt;Data reading is usually single‑threaded and limited by FUSE bandwidth and CPU performance.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To increase throughput, you can raise concurrency through multi‑threaded or sharded loading, fully utilizing I/O capacity. For large‑file sequential reads, the best performance is achieved when the entire dataset can be cached locally. If only on‑demand reading is required, the implementation is simple.&lt;br&gt;
For more details on optimizing large‑file sequential reads, see &lt;a href="https://juicefs.com/en/blog/solutions/idle-resources-elastic-high-throughput-storage-cache-pool"&gt;How JuiceFS Transformed Idle Resources into a 70 GB/s Cache Pool&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Massive small files&lt;/h3&gt;
&lt;p&gt;In computer vision and multimodal tasks, training datasets often consist of many individual files, for example, single images, video frames, or text annotations. Such massive small‑file scenarios place heavy pressure on metadata services.  &lt;/p&gt;
&lt;p&gt;In massive small-file scenarios, metadata performance is critical. On one hand, each file carries only a small amount of data; on the other hand, directory metadata access efficiency is low when a directory holds a huge number of small files.&lt;br&gt;
For read‑only workloads, enabling client metadata caching and extending the cache lifetime can improve performance. &lt;/p&gt;
&lt;p&gt;Moreover, the data read layer experiences higher IOPS pressure because small files cannot take advantage of read‑ahead. This makes requests more fragmented. Common optimizations include increasing local cache capacity; for the Enterprise Edition, you can also scale out the distributed cache cluster horizontally. Because small files derive little benefit from read‑ahead, their latency tends to be higher.  &lt;/p&gt;
&lt;p&gt;For performance tuning in this scenario, see &lt;a href="https://juicefs.com/en/blog/user-stories/multi-cloud-store-massive-small-files"&gt;How D-Robotics Manages Massive Small Files in a Multi-Cloud Environment with JuiceFS&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Large‑file random reads&lt;/h3&gt;
&lt;p&gt;This scenario is common in AI training, for example, when randomly accessing datasets in TFRecord, HDF5, or LMDB format by sample. Take model loading: if the dataset is accessed randomly and each read size equals the sample size (for example, 1 MB to 4 MB images or short videos), read‑ahead can waste bandwidth. Such scenarios can often break through IOPS bottlenecks by increasing concurrency.  &lt;/p&gt;
&lt;p&gt;Recommended measures include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Increase the number of data‑loading &lt;code&gt;reader&lt;/code&gt; threads.  &lt;/li&gt;
&lt;li&gt;Use asynchronous I/O to raise concurrency and saturate IOPS.  &lt;/li&gt;
&lt;li&gt;Improve the caching system, for example, pre‑map data into cache to boost underlying IOPS.  &lt;/li&gt;
&lt;li&gt;Adjust the &lt;code&gt;read ahead ratio&lt;/code&gt; parameter (for example, set it to &lt;code&gt;0.5&lt;/code&gt;) to reduce bandwidth waste from read‑ahead. For instance, a 4 MB sequential read would previously prefetch 4 MB; after adjustment, only 2 MB is prefetched.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this article, we’ve analyzed JuiceFS’ architecture from a performance perspective, covered benchmark I/O tests, and discussed tuning methods for typical AI scenarios. This provides an introductory reference for system performance. JuiceFS has been deployed in many production environments, and its distributed architecture offers a feasible balance between performance and cost.  &lt;/p&gt;
&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="http://go.juicefs.com/discord"&gt;community on Discord&lt;/a&gt;.&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 15 Apr 2026 07:22:00 +0000</pubDate><guid>https://www.juicefs.com/en/blog/engineering/juicefs-ai-workload-performance-optimization</guid></item><item><title>Optimizing JuiceFS on the Arm Architecture: MLPerf-Based Performance Tuning</title><link>https://www.juicefs.com/en/blog/engineering/arm-juicefs-performance-optimization-mlperf-tuning</link><description>&lt;div class="block-markdown"&gt;&lt;p&gt;Recently, building high-performance storage infrastructure on Arm platforms has become a technical focal point. &lt;a href="https://www.linaro.org/"&gt;Linaro&lt;/a&gt; is an international technology organization focused on the Arm ecosystem and open-source software. We collaborate with upstream and downstream industry players to address common issues and assist enterprise customers in productizing their solutions on an open-source foundation. Our team conducted systematic stress testing on &lt;a href="https://juicefs.com/docs/community/introduction/"&gt;JuiceFS Community Edition&lt;/a&gt; (using Redis for metadata storage) during MLPerf Storage benchmarks, covering a variety of typical machine learning training workloads.  &lt;/p&gt;
&lt;p&gt;Our test results show that system performance is largely influenced by memory bandwidth and metadata access efficiency. JuiceFS’ throughput directly determines GPU utilization and training efficiency. Through testing workloads such as 3D U-Net, ResNet-50, and CosmoFlow, the analysis revealed: in single-node scenarios, GPU utilization is primarily limited by memory copy latency; in two-node or multi-node scenarios, metadata access and inter-node synchronization become the main bottlenecks. In the article, we also provide tuning strategies and practical results to address these bottlenecks.  &lt;/p&gt;
&lt;p&gt;In summary, large-scale AI training performance tuning is a systematic engineering effort that requires coordinated optimization across storage systems, memory bandwidth, CPU scheduling, caching strategies, and more to achieve efficient deep learning data supply on Arm platforms.&lt;/p&gt;
&lt;h2&gt;Arm64 vs. x86_64 architecture differences and concurrency characteristics&lt;/h2&gt;
&lt;p&gt;Compared to x86, Arm’s application scope continues to expand, extending from mobile devices to IoT, wearables, PCs, automotive, and servers. Its high performance per watt is a key reason for its widespread adoption.&lt;br&gt;
From an architectural design perspective, Arm is a &lt;a href="https://en.wikipedia.org/wiki/Reduced_instruction_set_computer"&gt;reduced instruction set computer&lt;/a&gt; (RISC), while x86 is a &lt;a href="https://en.wikipedia.org/wiki/Complex_instruction_set_computer"&gt;complex instruction set computer&lt;/a&gt; (CISC). This design difference also affects how processors execute instructions. Arm64 instructions have a fixed length of 4 bytes, whereas x86 instructions have variable lengths ranging from 1 to 15 bytes. Consequently, x86 often requires more complex decoders. In contrast, Arm’s instructions are simpler and rely more heavily on effective instruction organization during compilation and code generation, thus requiring longer compilation times.  &lt;/p&gt;
&lt;p&gt;From an engineer’s perspective, there are other architectural differences that directly impact program behavior. &lt;strong&gt;Code that seems intuitive on x86 may not behave the same way on Arm. Several of the common pitfalls discussed later are fundamentally related to these underlying differences.&lt;/strong&gt;  &lt;/p&gt;
&lt;p&gt;One typical issue is the alignment requirement for atomic operations. Whether using Load-Link/Store-Conditional (LL/SC) or Large System Extensions (LSE), read-modify-write operations like atomic increments typically require aligned memory addresses. Newer LSE2 relaxes this restriction, supporting unaligned accesses within a 16-byte window. Data alignment is not mandatory for x86, but maintaining good alignment helps improve performance. See &lt;a href="https://developer.arm.com/documentation/ddi0487/maa/-Part-B-The-AArch64-Application-Level-Architecture/-Chapter-B2-The-AArch64-Application-Level-Memory-Model/-B2-8-Alignment-support/-B2-8-2-Alignment-of-data-accesses?lang=en#chdffegj"&gt;Arm Architecture Reference Manual for A-profile architecture&lt;/a&gt;.  &lt;/p&gt;
&lt;p&gt;Another key feature to note is that Arm employs a weakly ordered / relaxed memory model. The difference lies in the strength of constraints on memory access ordering. In multi-threaded scenarios, the same read/write operations are more likely to appear in program order on x86, whereas Arm permits more reordering. Thus, the order observed by other threads may differ from the source code order. When debugging issues on Arm, memory ordering effects must be carefully considered. For more details, see the Arm white paper: &lt;a href="https://developer.arm.com/documentation/107630/1-0/?lang=en"&gt;Synchronization Overview and Case Study on Arm Architecture&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Overview of JuiceFS and MLPerf&lt;/h2&gt;
&lt;p&gt;JuiceFS is an open-source, high-performance &lt;a href="https://en.wikipedia.org/wiki/Comparison_of_distributed_file_systems"&gt;distributed file system&lt;/a&gt; built on object storage. It leverages the cost advantages of object storage while delivering a user experience close to traditional file systems. It supports POSIX, HDFS SDK, Python SDK, and S3-compatible interfaces, adapting to various applications and data processing frameworks. It also supports cloud-native extensions, data security, and compression, making it widely applicable to AI training, inference, big data processing, and more.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;1JuiceFS 架构图.drawio&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;JuiceFS Community Edition architecture&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;To evaluate JuiceFS’ data supply capability under high-load scenarios like AI training, we can use the MLPerf Storage benchmark. Developed by MLCommons, this benchmark focuses on measuring a storage system’s ability to consistently and efficiently supply data to compute nodes.  &lt;/p&gt;
&lt;p&gt;Version 2.0 divides tests into training workloads and checkpoint workloads. The training workloads include 3D U-Net, ResNet-50, and CosmoFlow. They differ significantly in sample size and access patterns. Minimum GPU utilization requirements are set: 90% for 3D U-Net and ResNet-50, and 70% for CosmoFlow.  &lt;/p&gt;
&lt;p&gt;The table below shows MLPerf Storage 2.0 training workloads:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style="text-align: left;"&gt;Task&lt;/th&gt;
&lt;th style="text-align: left;"&gt;Reference network&lt;/th&gt;
&lt;th style="text-align: left;"&gt;Data loader&lt;/th&gt;
&lt;th style="text-align: left;"&gt;Sample size&lt;/th&gt;
&lt;th style="text-align: left;"&gt;Batch size&lt;/th&gt;
&lt;th style="text-align: left;"&gt;Accelerator utilization&lt;/th&gt;
&lt;th style="text-align: left;"&gt;Time per batch run (s)&lt;/th&gt;
&lt;th style="text-align: left;"&gt;Evaluate storage capability&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;Image segmentation (medical)&lt;/td&gt;
&lt;td style="text-align: left;"&gt;3D U-Net&lt;/td&gt;
&lt;td style="text-align: left;"&gt;PyTorch&lt;/td&gt;
&lt;td style="text-align: left;"&gt;146 MiB&lt;/td&gt;
&lt;td style="text-align: left;"&gt;7 x 146 = 1,022 MiB&lt;/td&gt;
&lt;td style="text-align: left;"&gt;90%&lt;/td&gt;
&lt;td style="text-align: left;"&gt;0.323 / 0.9 = 0.359 Data load time: 0.359-0.323 = 0.036&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Bandwidth, concurrent large block sequential reads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;Image classification&lt;/td&gt;
&lt;td style="text-align: left;"&gt;ResNet50&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Tensorflow&lt;/td&gt;
&lt;td style="text-align: left;"&gt;150 KiB&lt;/td&gt;
&lt;td style="text-align: left;"&gt;400 x 150 = 58.5 MiB&lt;/td&gt;
&lt;td style="text-align: left;"&gt;90%&lt;/td&gt;
&lt;td style="text-align: left;"&gt;0.224 / 0.9 = 0.249 Data load time: 0.249 - 0.224 = 0.025&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Bandwidth, IOPS, high concurrency medium block sequential reads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;Scientific (cosmology)&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Parameter prediction&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Tensorflow&lt;/td&gt;
&lt;td style="text-align: left;"&gt;2.7 MiB&lt;/td&gt;
&lt;td style="text-align: left;"&gt;1 x 2.7 = 2.7 MiB&lt;/td&gt;
&lt;td style="text-align: left;"&gt;70%&lt;/td&gt;
&lt;td style="text-align: left;"&gt;0.0035 / 0.7 = 0.005 Data load time: 0.005 - 0.0035 = 0.0015&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Bandwidth, IOPS, metadata latency, high concurrency sequential reads of many small files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;LLM checkpointing (new)&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Llama3&lt;/td&gt;
&lt;td style="text-align: left;"&gt;PyTorch&lt;/td&gt;
&lt;td style="text-align: left;"&gt;105GiB to 18TiB&lt;/td&gt;
&lt;td style="text-align: left;"&gt;—&lt;/td&gt;
&lt;td style="text-align: left;"&gt;—&lt;/td&gt;
&lt;td style="text-align: left;"&gt;—&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Bandwidth, concurrent sequential writes of extremely large files&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In the test flow, data is first read from the storage system into host memory before entering the compute phase. Training time is simulated to replicate the data flow of real training scenarios, eliminating the need for actual GPU deployment, lowering experimental barriers, and improving operational convenience.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;MLPerf Storage 数据流&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;MLPerf Storage data flow&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;h2&gt;MLPerf Storage v2.0 test principles and tuning&lt;/h2&gt;
&lt;p&gt;Before detailing specific model test results, it’s essential to understand the data access principles of distributed training. This helps readers grasp the causes of GPU utilization, storage throughput, and performance bottlenecks, enabling better comprehension of subsequent test results and tuning strategies.  &lt;/p&gt;
&lt;p&gt;Distributed &lt;a href="https://en.wikipedia.org/wiki/Machine_learning"&gt;machine learning&lt;/a&gt; typically uses data parallelism, where multiple parallel processes share the same dataset, and each process handles reading and processing its corresponding training batches.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;分布式训练数据访问原理示意图&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;Distributed training data access principle&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;MLPerf Storage training tests follow this approach: each training process reads data from the storage system in batches and simulates computation to evaluate the storage system’s ability to sustain data supply.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;MLPerf Storage 训练数据流示意图&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;MLPerf Storage training data flow&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;To understand the source of performance during testing, it’s also necessary to understand the data processing path within the JuiceFS client.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;JuiceFS 客户端线程与数据流示意图&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;JuiceFS client threads and data flow&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;As illustrated, when testing with JuiceFS, the execution flow can be roughly divided into three parts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Left side: Application-side I/O threads, such as fio or MLPerf Storage’s DataLoader threads, which initiate read/write requests and wait for completion.  &lt;/li&gt;
&lt;li&gt;Middle: The main goroutine in the FUSE daemon, which handles FUSE requests from kernel space, places file data into memory buffers and caches, and triggers backend metadata and object storage access.  &lt;/li&gt;
&lt;li&gt;Right side: Asynchronous goroutines for the Meta client and ObjectStore client, which interact with the backend MetaDB and ObjectStore clusters for data and metadata operations.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From a performance analysis perspective, we need to note two types of issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data copying, corresponding to steps like 2.1, 3, 4, 5, and 6 in the diagram. These steps introduce additional memory copy overhead and are often key areas for analyzing latency and CPU usage.  &lt;/li&gt;
&lt;li&gt;Synchronization and asynchronous boundaries. As shown, steps 1, 2, 3, 4, 5, and 6 are part of the synchronous path, where the request must wait for the current stage to complete before proceeding. Step 7 is part of the asynchronous path, handled by background goroutines interacting with backend storage.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Test 1: 3D U-Net&lt;/h3&gt;
&lt;p&gt;In this test, the sample size was 146 MiB per image file, and we focused on large-block read performance. The test results showed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In a single-node environment, the system could stably run up to 5 GPUs, with GPU utilization at about 50%.  &lt;/li&gt;
&lt;li&gt;In a two-node scenario, it could support 10 GPUs, also with GPU utilization around 50%.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To improve data read efficiency, we optimized the training parameters: we increased the number of reader threads from 4 to 16 to accelerate data generation, and switched to direct I/O to reduce buffer and memory copy overhead.&lt;br&gt;
&lt;strong&gt;Operational metrics showed that when mounting 6 GPUs on a single node, GPU utilization dropped to 83%, corresponding to a bandwidth of about 15.1 GB/s. This fell short of the expected high utilization target.&lt;/strong&gt; Further testing with fio on the storage side revealed similar bandwidth of about 15.1 GB/s. &lt;strong&gt;This indicated that the bottleneck had shifted to the JuiceFS client bandwidth rather than the GPU compute side.&lt;/strong&gt;&lt;/p&gt;
&lt;h4&gt;Optimization analysis 1: CPU pinning&lt;/h4&gt;
&lt;p&gt;To further investigate the cause of the client bandwidth limitation, we pinned the process to a specific CPU (running on NUMA nodes 2 and 3). Monitoring showed that all 48 CPU cores were nearly fully utilized. Further analysis of &lt;code&gt;top-down&lt;/code&gt;, &lt;code&gt;memory&lt;/code&gt;, and &lt;code&gt;miss&lt;/code&gt; metrics revealed a clear memory-bound condition, with most time spent on memory copying. This indicated that in the CPU-pinned scenario, the performance bottleneck of JuiceFS primarily came from CPU processing capacity and the additional latency caused by cross-NUMA node memory copying.&lt;/p&gt;
&lt;h4&gt;Optimization analysis 2: no CPU pinning&lt;/h4&gt;
&lt;p&gt;To understand the bandwidth limitations under more general conditions, we observed the scenario without CPU pinning. The results showed that while the CPU was not fully saturated, the &lt;code&gt;devkit tuner numafast&lt;/code&gt; metric indicated that remote memory access accounted for about 80% of total memory accesses. This meant a large number of memory accesses were crossing local NUMA nodes, potentially even across CPU sockets, introducing significant bandwidth loss and access latency.  &lt;/p&gt;
&lt;p&gt;From the perspective of hardware bandwidth, cross-socket memory access has inherent limitations. For example, on the Arm platform, the theoretical physical bandwidth across sockets was about 60 GB/s. Further measurements showed cross-socket copy bandwidth on Arm1 to be around 48 GB/s, while on two x86 platforms it was about 37 GB/s and 28 GB/s, respectively.  &lt;/p&gt;
&lt;p&gt;This suggested that in the scenario without CPU pinning, even though the compute cores were not fully exhausted, extensive cross-node, cross-socket remote memory access had become a major source of overhead. Therefore, we inferred that the inability to further increase JuiceFS bandwidth was likely not solely due to CPU compute power, but rather constrained by the bandwidth and latency of cross-socket memory access. &lt;strong&gt;In other words, the system bottleneck had shifted from “local CPU being too busy” to “remote memory access being too costly.”&lt;/strong&gt;  &lt;/p&gt;
&lt;p&gt;In summary, the reasons for the JuiceFS bandwidth limitation differed between the two scenarios:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;With CPU pinning, the bottleneck was primarily CPU resource consumption and the overhead of extensive memory copying.  &lt;/li&gt;
&lt;li&gt;Without CPU pinning, the bottleneck was largely due to a high proportion of non-local memory accesses, especially the bandwidth and latency penalties from cross-socket accesses.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Test 2: ResNet-50&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://blog.roboflow.com/what-is-resnet-50/"&gt;ResNet-50&lt;/a&gt; uses small samples (about 150 KiB each), with each batch containing 400 samples totaling about 58.5 MiB. This I/O test focused on data loading efficiency and training throughput under high GPU concurrency. The system maintained high utilization at large GPU scales:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Single node: 50 GPUs, 95% GPU utilization, about 9.2 GB/s bandwidth.  &lt;/li&gt;
&lt;li&gt;Two nodes: 96 GPUs, 90% GPU utilization, about 16.9 GB/s bandwidth.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;During testing, we adjusted the &lt;code&gt;reader.read_threads&lt;/code&gt; parameter from 8 to 1. For this model (medium-sized images), a single thread sufficed for data supply.&lt;/p&gt;
&lt;h4&gt;Optimization analysis 1: single-node bottleneck and memory bandwidth impact&lt;/h4&gt;
&lt;p&gt;With 55 GPUs on a single node, GPU utilization dropped to 86% while bandwidth remained at about 9.2 GB/s. This indicated the bottleneck had shifted to JuiceFS client bandwidth.  &lt;/p&gt;
&lt;p&gt;Further analysis revealed ResNet-50 tests used buffer I/O mode. Beyond reading data, memory copies during dataset processing consumed part of the memory bandwidth.  &lt;/p&gt;
&lt;p&gt;System memory copy bandwidth depends on memory channel count, memory frequency, and CPU frequency. Stream tests on nodes with different configurations showed that single-node sequential read bandwidth aligned with measured system memory bandwidth, indicating read throughput largely depends on system memory bandwidth. &lt;strong&gt;For training tasks requiring high throughput and GPU utilization, selecting nodes with higher memory bandwidth is recommended to significantly enhance data supply capacity and training efficiency.&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style="text-align: left;"&gt;&lt;/th&gt;
&lt;th style="text-align: left;"&gt;Single-CPU memory copy bandwidth data&lt;/th&gt;
&lt;th style="text-align: left;"&gt;JuiceFS single-node deployment read bandwidth&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;Arm3&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Arm3: 171 GB/s&lt;/td&gt;
&lt;td style="text-align: left;"&gt;25.3 GiB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;Arm2&lt;/td&gt;
&lt;td style="text-align: left;"&gt;114 GB/s&lt;/td&gt;
&lt;td style="text-align: left;"&gt;21.6 GiB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;Arm1&lt;/td&gt;
&lt;td style="text-align: left;"&gt;106 GB/s&lt;/td&gt;
&lt;td style="text-align: left;"&gt;18.3 GiB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;x862&lt;/td&gt;
&lt;td style="text-align: left;"&gt;90 GB/s&lt;/td&gt;
&lt;td style="text-align: left;"&gt;17.9 GiB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;x861&lt;/td&gt;
&lt;td style="text-align: left;"&gt;82 GB/s&lt;/td&gt;
&lt;td style="text-align: left;"&gt;16.6 GiB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h4&gt;Optimization analysis 2: two-node scaling bottlenecks and distributed limitations&lt;/h4&gt;
&lt;p&gt;In multi-node deployments, in addition to single-node performance limits, cross-node memory access, network transfer, and metadata latency become new bottlenecks. Therefore, two-node testing after single-node analysis helped identify these distributed constraints and guide system optimization.  &lt;/p&gt;
&lt;p&gt;In a two-node scenario, the system theoretically supported up to 100 GPUs, but in actual testing only 96 GPUs could be achieved. Analysis showed that per-operation read latency had increased. Although file data was already cached on local disks, metadata access latency became the primary limiting factor.  &lt;/p&gt;
&lt;p&gt;To address this issue, we made multiple optimizations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We grouped CPU cores to ensure training threads and I/O threads ran on the same NUMA node.  &lt;/li&gt;
&lt;li&gt;Pure data processing and metadata access were assigned to different CPU cores and storage paths.  &lt;/li&gt;
&lt;li&gt;We adjusted Redis cache and local cache policies to reduce latency under high-concurrency metadata access.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After these tunings, the two-node scenario stably supported 100 GPUs, with GPU utilization reaching the expected level.&lt;/p&gt;
&lt;h3&gt;Test 3: CosmoFlow&lt;/h3&gt;
&lt;p&gt;Compared with previous models, this model had a much smaller size per sample. This imposed higher demands on I/O and metadata access. In both single-node and two-node scenarios, the CosmoFlow test showed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Single node: Stably supported up to 10 GPUs (occasionally up to 12 GPUs), GPU utilization around 75%, bandwidth about 5.6 GB/s.  &lt;/li&gt;
&lt;li&gt;Key parameter adjustment: &lt;code&gt;reader.read_threads&lt;/code&gt; was reduced from &lt;code&gt;4&lt;/code&gt; to &lt;code&gt;1&lt;/code&gt;, batch size was set to 2 MiB, and a single thread was sufficient to meet data supply requirements.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Optimization analysis 1: single-node bottleneck – memory copy limiting GPU utilization&lt;/h4&gt;
&lt;p&gt;When we tried to increase the number of GPUs beyond 10, GPU utilization dropped. Log and performance data analysis revealed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data read time increased, while metadata access latency did not change significantly.  &lt;/li&gt;
&lt;li&gt;File data was cached on local disks, disk queues were not full, and latency was low, so the bottleneck was not in the storage device.  &lt;/li&gt;
&lt;li&gt;Profiling showed that the key bottleneck was memory copy (&lt;code&gt;memcpy&lt;/code&gt;) – cumulative delays from multiple copy operations in the data read path increased total read time.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Thus, we inferred that when the system demanded more memory bandwidth, memory copy latency became the main factor limiting read performance and GPU utilization.&lt;/p&gt;
&lt;h4&gt;Optimization analysis 2: two-node bottleneck – distributed synchronization and metadata latency&lt;/h4&gt;
&lt;p&gt;In the two-node scenario with 20 GPUs, the first round of testing showed significantly lower GPU utilization. Further analysis found:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One node had started training while the other was still performing dataset preprocessing (reading file lists and sharding).  &lt;/li&gt;
&lt;li&gt;Because CosmoFlow has a large data volume, reading high-index files took a long time. This caused the two nodes to start training out of sync, leading to lower GPU utilization in the first round.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To resolve this, we added a synchronization mechanism to ensure that all nodes completed dataset preprocessing before starting training. After this adjustment, the two-node test stably supported 20 GPUs, and GPU utilization reached the expected level.&lt;/p&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;p&gt;The key findings and optimization insights from our tests are summarized as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://mlcommons.org/working-groups/benchmarks/storage/"&gt;MLPerf Storage&lt;/a&gt; evaluates various file system capabilities through different combinations of sample sizes, file sizes, and batch sizes, including large/medium/small sequential read performance, file concurrency, total read bandwidth, metadata access latency, file read latency, and file operation stability. In read-only scenarios, fully utilizing high-speed near-end caches (including data and metadata caches) significantly improved read performance. Note that the smaller the file, the higher the requirements for IOPS and latency.  &lt;/li&gt;
&lt;li&gt;System memory and bandwidth have a decisive impact on performance. In memory‑copy‑intensive workloads, memory copies consume both bandwidth and CPU cycles, creating the illusion of "CPU busy" while the CPU actually spends most of its time waiting for data. Higher memory bandwidth directly leads to better storage throughput – a key reference for server selection.  &lt;/li&gt;
&lt;li&gt;The Go runtime has limited NUMA awareness. For large‑core deployments, performance may degrade compared to using fewer cores. Cross‑NUMA (especially cross‑socket) memory accesses should be avoided because cross‑socket bandwidth is typically low (tens of GB/s), increasing latency. In practice, allocate only enough CPU cores, not all, to prevent extra memory access delays.  &lt;/li&gt;
&lt;li&gt;System‑level optimizations exist. For memory‑copy‑intensive operations, newer Arm systems provide specialized instructions. We collaborated with the Arm community to push configuration improvements, achieving up to tens of percentage points higher bandwidth in some scenarios.  &lt;/li&gt;
&lt;li&gt;For operations involving heavy kernel‑userspace interaction (for example, file I/O and metadata processing), reducing unnecessary system calls lowers latency. Concentrating file processing within the same production node and avoiding cross‑NUMA/socket access further improves performance and stability.  &lt;/li&gt;
&lt;li&gt;Cache policy tuning matters. Under high single‑node load, adjusting JuiceFS memory cache policies to reduce invalid memory bandwidth usage effectively increases GPU utilization and storage throughput. Overall, MLPerf Storage Benchmark is a system engineering problem requiring coordinated optimization of file system, memory bandwidth, CPU scheduling, and caching strategies.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="http://go.juicefs.com/discord"&gt;community on Discord&lt;/a&gt;.&lt;/p&gt;&lt;/div&gt;</description><pubDate>Thu, 02 Apr 2026 12:45:00 +0000</pubDate><guid>https://www.juicefs.com/en/blog/engineering/arm-juicefs-performance-optimization-mlperf-tuning</guid></item><item><title>How D-Robotics Manages Massive Small Files in a Multi-Cloud Environment with JuiceFS</title><link>https://www.juicefs.com/en/blog/user-stories/multi-cloud-store-massive-small-files</link><description>&lt;div class="block-markdown"&gt;&lt;p&gt;&lt;a href="https://en.d-robotics.cc/"&gt;D-Robotics&lt;/a&gt;, founded in 2024 and spun off from &lt;a href="https://en.wikipedia.org/wiki/Horizon_Robotics"&gt;Horizon Robotics&lt;/a&gt;' robotics division, specializes in the research and development of foundational computing platforms for consumer-grade robots. In 2025, we released an &lt;a href="https://www.nvidia.com/en-us/glossary/embodied-ai/"&gt;embodied AI&lt;/a&gt; foundation model.  &lt;/p&gt;
&lt;p&gt;In robot data management, training, and inference, the sheer volume of data is immense. Using object storage presents challenges such as handling small files and managing multi-cloud data. After trying some solutions and replacing private MinIO with SSD storage, we still faced difficulties in addressing these challenges. Ultimately, we selected &lt;a href="https://juicefs.com/docs/community/introduction/"&gt;JuiceFS&lt;/a&gt; as our core storage solution.  &lt;/p&gt;
&lt;p&gt;JuiceFS' inherent adaptability for cross-cloud operations efficiently supports data sharing needs in multi-cloud environments. In training scenarios, JuiceFS' cache mechanism, specifically designed for small file data, effectively replaces traditional caching solutions while achieving a cost-effective balance between cost and efficiency, fully meeting storage performance requirements. Currently, we manage tens of millions of files.  &lt;/p&gt;
&lt;p&gt;In this article, we’ll share our application characteristics, storage pain points, solution selection, implementation practices, and production tuning experiences. We hope our experience offers useful insights for those facing similar challenges in the industry.&lt;/p&gt;
&lt;h2&gt;Storage pain points in the robotics industry&lt;/h2&gt;
&lt;p&gt;The cloud platform serves as our core technical hub, undertaking key application functions such as simulation environment setup, data generation and &lt;a href="https://www.ibm.com/think/topics/model-training"&gt;model training&lt;/a&gt;, model lightweighting and deployment, and visual verification. The data types involved in the platform are diverse, mainly including sensor image data, LiDAR point cloud data, model weights and configuration data, motor operational data, and map construction data.  &lt;/p&gt;
&lt;p&gt;While &lt;a href="https://en.wikipedia.org/wiki/Object_storage"&gt;object storage&lt;/a&gt; meets basic storage needs for massive data, its performance limitations become particularly obvious when handling the massive small files frequently encountered in robotics applications. Our storage system faced four challenges:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Metadata performance bottleneck with massive small files:&lt;/strong&gt; Robot model training involves tens of millions to billions of sensor images, LiDAR data, and model files. Traditional object storage (like standard S3) exhibits significant metadata operation bottlenecks at this scale. The fixed API latency for routine operations like listing files or retrieving attributes is typically 10–30 ms. This directly constrains queries per second (QPS) performance during training and inference and impacts overall R&amp;amp;D efficiency.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Inefficient &lt;a href="https://en.wikipedia.org/wiki/Multicloud"&gt;multi-cloud&lt;/a&gt; collaboration and data flow:&lt;/strong&gt; As robotics companies increasingly adopt multi-cloud architectures for their R&amp;amp;D and production applications, ensuring efficient data synchronization and sharing across different cloud platforms and geographical regions has become a common challenge for the industry. Traditional storage solutions typically suffer from low cross-cloud data transfer efficiency and are often deeply integrated with a single cloud provider. This leads to technical lock-in and makes it difficult to achieve flexible cross-cloud deployment and data collaboration.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The impossible trinity of performance, cost, and operations:&lt;/strong&gt; High-performance parallel file systems offer high throughput and low latency but typically rely on all-flash arrays or dedicated hardware. This leads to high hardware investment and ongoing operational costs, plus complex deployment. Low-cost object storage offers good elasticity but is difficult to support the high-throughput I/O demands of GPU clusters in AI training scenarios. A common industry workaround is using a high-speed file system as a cache synchronized with S3. However, the extra data synchronization steps significantly reduce usability and fail to achieve efficient storage-compute synergy.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Difficulty in dataset version management:&lt;/strong&gt; The rapid iteration cycle of robot models requires efficient and granular management of multiple dataset versions. Using physical copies for version control directly leads to exponentially higher underlying storage consumption, significantly increasing costs. Moreover, the difficulty of retrieving, reusing, and maintaining multi-version data also increases substantially.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Storage selection: JuiceFS vs. MinIO/S3 vs. PFS&lt;/h2&gt;
&lt;p&gt;To address these storage challenges, we established a clear evaluation framework for storage selection. A comprehensive comparative test was conducted on mainstream storage solutions across seven core dimensions: storage architecture, protocol compatibility, metadata performance, scalability, multi-cloud adaptability, cost efficiency, and operational complexity.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style="text-align: left;"&gt;Comparison basis&lt;/th&gt;
&lt;th style="text-align: left;"&gt;JuiceFS&lt;/th&gt;
&lt;th style="text-align: left;"&gt;MinIO / Public Cloud S3&lt;/th&gt;
&lt;th style="text-align: left;"&gt;CephFS / Public Cloud FS (CPFS)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;Storage architecture&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Separation of metadata and data&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Unified object storage&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Metadata and data typically coupled, often with kernel-space parallel design&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;Protocol support&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Full compatibility: POSIX, HDFS, S3 API, Kubernetes CSI&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Primarily S3 API, with weak POSIX compatibility&lt;/td&gt;
&lt;td style="text-align: left;"&gt;POSIX-oriented; HDFS or S3 compatibility often requires plugins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;Metadata performance&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Very high: sub-millisecond latency, supports hundreds of billions of files per volume&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Lower: high metadata overhead for massive small files; API call overhead about 10–30 ms&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Medium to high: performance bottlenecks and complexity challenges at ultra-large scale (100M+ files)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;Scalability&lt;/td&gt;
&lt;td style="text-align: left;"&gt;High: horizontal scaling, supports tens to hundreds of billions of files per volume&lt;/td&gt;
&lt;td style="text-align: left;"&gt;High: near-infinite storage capacity, but small-file management efficiency degrades with scale&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Moderate: scaling limited by metadata nodes; operational complexity grows exponentially with scale&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;Multi-cloud adaptability&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Native support&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Relies on sync tools; cross-cloud data flow inefficient; global unified view difficult&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Limited: often tightly bound to specific hardware or cloud provider; cross-cloud deployment is complex&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;Cost efficiency&lt;/td&gt;
&lt;td style="text-align: left;"&gt;High performance-to-cost ratio&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Low (storage only): cheap storage, but low GPU utilization in high-throughput scenarios like AI training&lt;/td&gt;
&lt;td style="text-align: left;"&gt;High: often requires all-flash architecture or dedicated hardware; high operational labor cost&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Based on the comparison results above, JuiceFS demonstrates significant advantages in core performance, scalability, multi-cloud adaptability, and cost efficiency. This makes it the preferred choice for our unified storage solution.&lt;br&gt;
Furthermore, JuiceFS has been widely adopted in the &lt;a href="https://juicefs.com/en/blog?tag=AI%20storage"&gt;autonomous driving&lt;/a&gt; industry. Leading companies such as Horizon Robotics have leveraged JuiceFS to manage data at the exabyte scale. This demonstrates its maturity and effectiveness in large-scale production environments.&lt;br&gt;
For our specific application scenarios, JuiceFS' core technical advantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Decoupled architecture:&lt;/strong&gt; JuiceFS adopts a metadata-data separation architecture, persisting data in cost-effective object storage (like S3 or OSS) while storing metadata in databases like Redis or TiKV. This decoupled design enables elastic storage scaling and reduces dependence on any single cloud provider.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Chunking and caching mechanisms:&lt;/strong&gt; JuiceFS &lt;a href="https://juicefs.com/docs/community/architecture#how-juicefs-store-files"&gt;uses chunks, slices, and blocks&lt;/a&gt; to significantly improve small file read efficiency and enhance concurrent read/write performance. In addition, multi-level caching (memory, local SSD, distributed cache) reduces access latency for hot data. This meets the demands of high-throughput training workloads.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cloud-native adaptability:&lt;/strong&gt; By providing a &lt;a href="https://juicefs.com/docs/csi/introduction"&gt;CSI Driver&lt;/a&gt;, JuiceFS delivers persistent storage decoupled from compute nodes in Kubernetes environments, supporting stateless container deployment and cross-cloud migration. It enables data sharing, enhances application high availability and flexibility, and adapts to various Kubernetes deployment methods.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Full-stack support for AI training:&lt;/strong&gt; JuiceFS fully supports POSIX, HDFS, and S3 API, and is compatible with mainstream AI frameworks such as PyTorch and TensorFlow. It can be integrated without code modifications, lowering the technical barrier for adoption.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multi-cloud support:&lt;/strong&gt; Its cross-cloud capabilities and high-performance metadata engine ensure efficient data flow, perfectly aligning with our strategy of "computing power on demand."&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;From a cost perspective, JuiceFS does not offer a significant cost advantage in the early stages of small-scale deployment. However, when data volume reaches the petabyte level—especially at the 10 PB or 100 PB scale—and is compared against all-flash storage solutions, its cost-efficient architecture built on object storage becomes fully evident.&lt;/strong&gt; In addition, JuiceFS requires minimal operational overhead. Currently, we need only one engineer to manage the entire cloud platform and storage system, a fraction of the personnel required by traditional solutions.&lt;/p&gt;
&lt;h2&gt;From Community Edition to Enterprise Edition: addressing larger-scale scenarios&lt;/h2&gt;
&lt;p&gt;As our application continued to expand, we encountered limitations when using Redis as the &lt;a href="https://juicefs.com/docs/community/databases_for_metadata/"&gt;metadata engine&lt;/a&gt;—specifically, physical memory capacity constrained data scalability. When the number of files approached the hundred-million level, metadata query latency increased significantly. This impacted the concurrency efficiency of training tasks. After using the clone feature, the metadata volume grew substantially. In addition, in cross-cloud scenarios, we faced higher demands for metadata synchronization and mirror file system capabilities. We also required more granular capacity controls and permission management at the directory level.  &lt;/p&gt;
&lt;p&gt;Considering these requirements—along with our desire to leverage local SSDs on GPU nodes to build a distributed cache layer for improved performance—we decided to deploy &lt;a href="https://juicefs.com/docs/cloud/"&gt;JuiceFS Enterprise Edition&lt;/a&gt; in parallel, migrating core scenarios such as ultra-large-scale directory management and multi-node collaborative training to this version. Through this scenario-based approach, we’ve effectively enhanced the adaptability of our overall storage system and established a solid foundation for future application growth. Below are the key features of the Enterprise Edition that we’ve applied in real-world scenarios.&lt;/p&gt;
&lt;h3&gt;High-performance metadata engine: solving the bottleneck of large-scale directory retrieval&lt;/h3&gt;
&lt;p&gt;For high-frequency operations such as traversing directories with hundreds of millions of files and deep pagination queries, we previously encountered the "slower as you query" problem with traditional storage solutions. When the number of files in a single directory exceeded 10 million, and the pagination offset surpassed 100,000 entries, response latency would spike from hundreds of milliseconds to several seconds. This severely impacted data filtering efficiency.  &lt;/p&gt;
&lt;p&gt;After switching to JuiceFS Enterprise Edition, its native tree-structured metadata storage architecture played a key role. Unlike the flat key-value storage used—which stores file metadata in a disordered manner—this tree structure allows direct navigation to directory levels, reducing the scope of metadata scans. In our actual tests, deep pagination queries (with an offset of 500,000 entries) in a directory containing 120 million files saw latency drop from 3.8 seconds to just 210 milliseconds. This fully met the retrieval needs of large-scale datasets. In addition, this engine supports storing hundreds of billions of files per volume, and we’ve already used it to manage three petabyte-scale training datasets stably, aligning with our application growth expectations.&lt;/p&gt;
&lt;h3&gt;Enterprise-grade distributed cache: improving data sharing efficiency in multi-node, multi-GPU training&lt;/h3&gt;
&lt;p&gt;In multi-node, multi-GPU training scenarios, we previously faced challenges such as low cache hit rates and cross-node bandwidth congestion. The open-source version only supports local caching on each node. This means that when multiple nodes pull the same dataset simultaneously, each node must access object storage independently. This resulted in single-node bandwidth utilization exceeding 90%, with average training job startup delays of up to 20 minutes.  &lt;/p&gt;
&lt;p&gt;With JuiceFS Enterprise Edition's &lt;a href="https://juicefs.com/docs/cloud/guide/distributed-cache/"&gt;distributed caching&lt;/a&gt; feature, we set up a distributed cache across a 12-node training cluster using just three commands. The dataset only needs to be pulled from object storage once and is cached in a pool built from local SSDs across the nodes. As a result, &lt;strong&gt;the cache hit rate for multi-node collaborative training increased from 45% to 92%, cross-node bandwidth utilization dropped to below 15%, and training job startup time was reduced to under three minutes&lt;/strong&gt;. This significantly improved compute utilization.&lt;/p&gt;
&lt;h3&gt;Enhanced cross-cloud collaboration: building a low-operational-cost cross-cloud data foundation&lt;/h3&gt;
&lt;p&gt;Since our R&amp;amp;D environments are distributed across two cloud environments, we previously encountered challenges with &lt;strong&gt;slow cross-cloud data synchronization and high operational costs&lt;/strong&gt;. Using traditional synchronization tools to maintain data consistency between the two clouds required configuring eight scheduled tasks, with an average synchronization delay of four hours, and dedicated personnel needed to investigate sync failures weekly.  &lt;/p&gt;
&lt;p&gt;By using the JuiceFS sync tool combined with our internal AI operations tools, we achieved automated configuration of synchronization policies. The system automatically adjusts sync priorities based on data heat levels, keeping cross-cloud data latency within 10 minutes. In addition, tasks such as failure retries and log alerts for synchronization are fully automated, eliminating the need for dedicated monitoring. &lt;strong&gt;This has reduced operational overhead by 70%&lt;/strong&gt;, and we now stably support multiple training projects across two cloud platforms sharing the same dataset. Going forward, we plan to use the Enterprise Edition's mirror file system feature to further enhance cross-cloud data collaboration.&lt;/p&gt;
&lt;h2&gt;JuiceFS optimization&lt;/h2&gt;
&lt;h3&gt;Client cache and write performance tuning&lt;/h3&gt;
&lt;p&gt;We need to pay attention to compatibility issues between caching strategies and Kubernetes resource limits. For example, using memory as a local cache path with improper configuration may lead to abnormal memory growth in the Mount Pod, or insufficient resource quota reservations may cause checkpoint loss or file handle write exceptions during long-running training tasks.  &lt;/p&gt;
&lt;p&gt;Regarding write performance tuning, enabling writeback mode can improve small file write throughput to some extent. However, considering production environment requirements for data consistency, we still adopt write-through synchronous mode to reduce data risks in extreme crash scenarios. It’s recommended to cautiously enable writeback mode only in scenarios with lower data reliability requirements, such as temporary computing or offline data cleaning, based on actual needs.&lt;/p&gt;
&lt;h3&gt;Deployment and network topology optimization&lt;/h3&gt;
&lt;p&gt;For more stable performance, it’s strongly recommended to deploy the metadata engine and compute nodes within the same region during deployment. In actual operations, we observed that cross-region deployment could increase metadata operation latency by several to ten times. This significantly impacted I/O-intensive operations such as data decompression. Deploying metadata services and GPU computing resources within the same region helps maintain performance while controlling network transmission costs, improving overall resource utilization efficiency.&lt;/p&gt;
&lt;h3&gt;Data warm-up and cache optimization&lt;/h3&gt;
&lt;p&gt;In a 10-gigabit network environment, fully utilizing JuiceFS' data &lt;a href="https://juicefs.com/docs/cloud/reference/command_reference/#warmup"&gt;warm-up&lt;/a&gt; and reasonably adjusting data block sizes based on application scenarios can better leverage network bandwidth capabilities and improve read throughput. Combined with the distributed cache architecture, this can effectively enhance data sharing efficiency in multi-node concurrent scenarios and improve cache hit rates during high-concurrency reads. This thereby optimizes the overall performance of large-scale AI training tasks.&lt;/p&gt;
&lt;h3&gt;Resource quotas and high availability guarantee&lt;/h3&gt;
&lt;p&gt;In enterprise-level multi-role operations and storage responsibility separation scenarios, to avoid operational risks caused by inconsistent configurations, it’s recommended to finely control resource quotas for &lt;a href="https://juicefs.com/docs/csi/introduction/"&gt;JuiceFS CSI Driver&lt;/a&gt; in Kubernetes environments. By appropriately setting CPU and memory request/limit for Mount Pods, Pod restarts or node anomalies caused by resource preemption can be reduced. In practice, resource reservation ratios can be dynamically adjusted based on cluster load.  &lt;/p&gt;
&lt;p&gt;In addition, for scenarios with high application continuity requirements, the automatic mount point recovery feature for Mount Pods can be enabled to achieve automated fault recovery for storage services, further ensuring underlying storage stability.&lt;/p&gt;
&lt;h3&gt;Multi-tenancy&lt;/h3&gt;
&lt;p&gt;We provide independent &lt;a href="https://en.wikipedia.org/wiki/File_system"&gt;file systems&lt;/a&gt; and storage buckets for large enterprise customers, while achieving isolation for small and medium-sized enterprises and end users through subdirectory-level directory isolation and permission control.  &lt;/p&gt;
&lt;p&gt;Large enterprises can flexibly scale throughput and capacity, avoiding performance bottlenecks associated with shared storage buckets. For small and medium-sized enterprises and end users, we ensure data security and independence through subdirectory isolation and permission control, while enabling accurate metering and billing.  &lt;/p&gt;
&lt;p&gt;This architecture ensures tenant isolation while flexibly allocating resources, improving system management efficiency.&lt;/p&gt;
&lt;h3&gt;Version management&lt;/h3&gt;
&lt;p&gt;Using the &lt;code&gt;juicefs clone&lt;/code&gt; command, copies of original datasets can be quickly created and modified independently without affecting the source data. The clone operation only copies file metadata, while data only stores additional changes, saving underlying storage space. This feature supports managing multiple versions, facilitating rollback and recovery and ensuring data security and version control.&lt;/p&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;p&gt;JuiceFS' characteristics in metadata performance, scalability, cross-cloud adaptability, and comprehensive cost efficiency have made it our choice for building a unified storage layer. Currently, we adopt both JuiceFS Community Edition and Enterprise Edition to accommodate different storage requirements across various application scenarios. &lt;/p&gt;
&lt;p&gt;In the future, we plan to further implement JuiceFS in the embodied intelligence field, addressing specific storage needs in this scenario. These include high-throughput processing of time-series data, precise multi-modal data alignment, edge-cloud collaborative storage, and integrated management of simulation and real-world data.  &lt;/p&gt;
&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="https://go.juicefs.com/slack/"&gt;community on Slack&lt;/a&gt;.&lt;/p&gt;&lt;/div&gt;</description><pubDate>Thu, 05 Mar 2026 09:22:00 +0000</pubDate><guid>https://www.juicefs.com/en/blog/user-stories/multi-cloud-store-massive-small-files</guid></item><item><title>The Design Journey of FUSE: From Kernel-Space to User-Space File Systems</title><link>https://www.juicefs.com/en/blog/engineering/design-fuse-kernel-user-space</link><description>&lt;div class="block-markdown"&gt;&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Filesystem_in_Userspace"&gt;Filesystem in Userspace&lt;/a&gt; (FUSE) born in 2001, enables users to create custom file systems in user space. By lowering the barrier to file system development, FUSE empowers developers to innovate without modifying kernel code. &lt;a href="https://juicefs.com/docs/community/introduction/"&gt;JuiceFS&lt;/a&gt;, a high-performance distributed file system, leverages FUSE’s flexibility and extensibility to deliver robust storage solutions.  &lt;/p&gt;
&lt;p&gt;In this article, we’ll explore FUSE’s architecture and advantages, tracing the evolution of kernel file systems and network file systems that laid the groundwork for FUSE. Finally, we’ll share JuiceFS’ practical insights into optimizing FUSE performance for &lt;a href="https://en.wikipedia.org/wiki/Artificial_intelligence"&gt;AI&lt;/a&gt; workloads. Since FUSE requires switching between user space and kernel space, it brings certain overhead and may lead to I/O latency. Therefore, many people have doubts about its performance. From our practical experience, FUSE can meet performance requirements in most AI scenarios. We’ll elaborate relevant details in this article. &lt;/p&gt;
&lt;h2&gt;Standalone file systems: Kernel space and VFS&lt;/h2&gt;
&lt;p&gt;The file system, as a core underlying component of the operating system, is responsible for frequent operations on storage devices. It was initially designed entirely in kernel space. The &lt;em&gt;kernel&lt;/em&gt; concept emerged as computer hardware and software became increasingly complex, and operating systems separated the code for managing underlying resources from user programs.&lt;/p&gt;
&lt;h3&gt;Kernel space and user space&lt;/h3&gt;
&lt;p&gt;Kernel space:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The kernel is code with super privileges. It manages a computer’s core resources, such as CPU, memory, storage, and network.  &lt;/li&gt;
&lt;li&gt;When kernel code runs, the program enters kernel space, enabling full access to and control over underlying hardware. Due to the kernel’s high privileges, its code must undergo stringent testing and verification, and ordinary users cannot modify it freely.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;User space:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It’s the code of various applications we commonly use, such as browsers and games.  &lt;/li&gt;
&lt;li&gt;In user space, the permissions of programs are strictly limited and cannot directly access important underlying resources.&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;Kernel space and user space&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;Kernel space and user space&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;If an application needs to use a file system, it must access it through the interface designed by the operating system, such as the commonly used &lt;code&gt;OPEN&lt;/code&gt;, &lt;code&gt;READ&lt;/code&gt;, and &lt;code&gt;WRITE&lt;/code&gt;, which are system calls. The role of system calls is to build a bridge between user space and kernel space. Modern operating systems often define hundreds of system calls, each with its own clear name, number, and parameters.  &lt;/p&gt;
&lt;p&gt;When an application makes a system call, it enters a section of kernel space code and returns the results to user space after execution. It’s worth noting that this entire process from user space to kernel space and then back to user space belongs to the same process category.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;System call architecture&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;System call architecture&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;h3&gt;Virtual file systems&lt;/h3&gt;
&lt;p&gt;After understanding the background knowledge above, we’ll briefly explain how user space and kernel space interact when a user calls a &lt;a href="https://en.wikipedia.org/wiki/File_system"&gt;file system&lt;/a&gt; interface.  &lt;/p&gt;
&lt;p&gt;The kernel encapsulates a set of universal virtual file system interfaces through &lt;a href="https://en.wikipedia.org/wiki/Virtual_file_system"&gt;virtual file system&lt;/a&gt; (VFS), exposes them to user space via system calls, and provides programming interfaces to the underlying file systems. The underlying file systems need to implement their own file system interfaces according to the VFS format.   &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The standard process for user space to access the underlying file system:&lt;/strong&gt; a system call -&amp;gt; the VFS -&amp;gt; the underlying file system -&amp;gt; the physical device  &lt;/p&gt;
&lt;p&gt;For example, when we call open in an application, it carries a path as its parameter. After this call reaches the VFS layer, the VFS searches level by level in its tree structure based on this path. Ultimately, it finds a corresponding target and its affiliated underlying file system. This underlying file system also has its own implementation of the open method. Then, it passes this open call to the underlying file system.  &lt;/p&gt;
&lt;p&gt;The Linux kernel supports dozens of different file systems. For different storage media such as memory or networks, different file systems are used for management. &lt;strong&gt;The most critical point is that the extensibility of VFS enables the Linux system to easily support a variety of file systems to meet various complex storage needs; at the same time, this extensibility also provides a foundation for FUSE to implement kernel-space functions in user space later.&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;Network file systems: Breaking kernel boundaries&lt;/h2&gt;
&lt;p&gt;With the growth of computing needs, the performance of a single computer gradually could not meet the increasing computing and storage requirements. People began to introduce multiple computers to share the load and improve overall efficiency.  &lt;/p&gt;
&lt;p&gt;In this scenario, an application often needs to access data distributed on multiple computers. &lt;strong&gt;To solve this problem, people proposed the concept of introducing a virtual storage layer in the network, virtually mounting the remote computer's file system (such as a certain directory) through a network interface to the local computer's node. The purpose of this is to enable the local computer to seamlessly access remote computer data as if the data was stored locally.&lt;/strong&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;Network file system (NFS) architecture&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;Network file system (NFS) architecture&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;Specifically, if a local computer needs to access remote data, a subdirectory of the remote computer can be virtually mounted to a node of the local computer through a network interface. In this process, the application does not need to make any modifications and can still access these paths through standard file system interfaces as if they were local data.&lt;/p&gt;
&lt;p&gt;When the application performs operations on these network paths (such as hierarchical directory lookup), these operations are converted into network requests and sent to the remote computer in the form of remote procedure call (RPC) for execution. After receiving these requests, the remote computer performs corresponding operations (such as finding files and reading data) and returns the results to the local computer.&lt;/p&gt;
&lt;p&gt;The process above is a simple implementation of the &lt;a href="https://en.wikipedia.org/wiki/Network_File_System"&gt;NFS&lt;/a&gt; protocol. As a network file system protocol, NFS provides an efficient solution for resource sharing between multiple computers. It allows users to mount and access remote file systems as conveniently as operating local file systems.&lt;/p&gt;
&lt;p&gt;Traditional file systems typically run entirely in the kernel space of a single node, while NFS was the first to break this limitation. The server-side implementation combines kernel space with user space. The subsequent design of FUSE was inspired by this approach.&lt;/p&gt;
&lt;h2&gt;FUSE: File system innovation from kernel to user space&lt;/h2&gt;
&lt;p&gt;With the continuous development of computer technology, many emerging application scenarios require using custom file systems. &lt;strong&gt;Traditional kernel-space file systems have high implementation difficulties and version compatibility issues. The NFS architecture first broke through the limitations of the kernel.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Based on this, someone proposed an idea: Can we transplant the NFS network protocol to a single node, transfer the server-side functionality to a user-space process, while retaining the client running in the kernel, and use system calls instead of network communication to realize file system functions in user space? This idea eventually led to the birth of FUSE.&lt;/p&gt;
&lt;p&gt;In 2001, Hungarian computer scientist Miklos Szeredi introduced FUSE, a framework that allows developers to implement file systems in user space. &lt;strong&gt;The core of FUSE is divided into two parts: the kernel module and the user-space library (&lt;a href="https://github.com/libfuse/libfuse"&gt;libfuse&lt;/a&gt;).&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Its kernel module, as part of the operating system kernel, interacts with VFS, forwarding file system requests from VFS to user space, and returning the processing results of user space to VFS. This design allows FUSE to implement custom file system functions without modifying kernel code.&lt;/p&gt;
&lt;p&gt;The FUSE user-space library (libfuse) provides an API library that interacts with the FUSE kernel module and helps users implement a daemon running in user space. The daemon handles file system requests from the kernel and implements specific file system logic.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;FUSE workflow&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;FUSE workflow&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;In specific implementations, the user-space daemon and kernel module collaborate through the following steps to complete file operations:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Request reception&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;1.1 The kernel module registers a &lt;code&gt;*&lt;/code&gt; character device (&lt;code&gt;/dev/fuse&lt;/code&gt;) as a communication channel. The daemon reads requests from this device by calling &lt;code&gt;read()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;1.2 If the FUSE request queue of the kernel is empty, &lt;code&gt;read()&lt;/code&gt; enters a blocking state. At this time, the daemon pauses execution and releases CPU until a new request appears in the queue (implemented through the kernel's wait queue mechanism).&lt;/p&gt;
&lt;p&gt;1.3 When an application initiates a file operation (such as &lt;code&gt;open&lt;/code&gt; and &lt;code&gt;read&lt;/code&gt;), the kernel module encapsulates the request into a specially formatted data packet and inserts it into the request queue, waking the blocked daemon.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Request processing&lt;br&gt;
After the daemon reads the request data packet from the character device, it calls the corresponding user-space processing function according to the operation type (such as reading, writing, and creating a file).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Result returning&lt;br&gt;
After processing is complete, the daemon serializes the result (such as the content of the read file or error code) according to the FUSE protocol format and writes the data packet back to the character device through &lt;code&gt;write()&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;After the kernel module receives the response:&lt;/p&gt;
&lt;p&gt;3.1 It parses the data packet content and passes the results to the waiting application.&lt;/p&gt;
&lt;p&gt;3.2 It wakes up the system call blocked in the application to continue executing subsequent logic.&lt;/p&gt;
&lt;p&gt;The emergence of FUSE brought revolutionary changes to file system development. By migrating the implementation of file systems from kernel space to user space, FUSE significantly reduced development difficulty, improved system flexibility and extensibility, and was widely applied in various scenarios such as network file systems, encrypted file systems, and virtual file systems.&lt;/p&gt;
&lt;h2&gt;JuiceFS: A FUSE user-space distributed file system&lt;/h2&gt;
&lt;p&gt;In 2017, with the full entry of IT infrastructure into the cloud era, the architecture faced unprecedented challenges. In this background, JuiceFS was born. As a &lt;a href="https://www.geeksforgeeks.org/distributed-systems/what-is-dfsdistributed-file-system/"&gt;distributed file system&lt;/a&gt; based on object storage, it uses FUSE technology to build its file system architecture, using FUSE’s flexible extensibility to meet the diverse needs of cloud computing environments.&lt;/p&gt;
&lt;p&gt;Through FUSE, the JuiceFS file system can be mounted to servers in a &lt;a href="https://en.wikipedia.org/wiki/POSIX#:~:text=The%20Portable%20Operating%20System%20Interface,maintaining%20compatibility%20between%20operating%20systems."&gt;POSIX&lt;/a&gt;-compatible manner. It treats massive cloud storage as local storage. Common file system commands, such as &lt;code&gt;ls&lt;/code&gt;, &lt;code&gt;cp&lt;/code&gt;, and &lt;code&gt;mkdir&lt;/code&gt;, can be used to manage files and directories in JuiceFS.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;JuiceFS 架构图（第四版）-第 2 页-winfsp (2)&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;JuiceFS Community Edition architecture&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;Let’s take a user mounting JuiceFS and then opening one of its files as an example. The request first goes through the kernel VFS, then is passed to the kernel's FUSE module, and communicates with the JuiceFS client process through &lt;code&gt;/dev/fuse&lt;/code&gt; device. The relationship between VFS and FUSE can be simply regarded as a client-server protocol, with VFS acting as the client requesting service, and the user-space JuiceFS acting as the server role, handling these requests.&lt;/p&gt;
&lt;p&gt;The workflow is as follows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;After JuiceFS is mounted, the &lt;code&gt;go-fuse&lt;/code&gt; module inside JuiceFS opens &lt;code&gt;/dev/fuse&lt;/code&gt; to obtain &lt;code&gt;mount fd&lt;/code&gt; and start several threads to read the FUSE requests of the kernel.  &lt;/li&gt;
&lt;li&gt;The user calls the &lt;code&gt;open&lt;/code&gt; function, enters the VFS layer through the C library and system call, and the VFS layer transfers the request to the kernel's FUSE module.  &lt;/li&gt;
&lt;li&gt;The kernel FUSE module puts the &lt;code&gt;open&lt;/code&gt; request into the queue corresponding to the &lt;code&gt;fd&lt;/code&gt; of &lt;code&gt;juicefs mount&lt;/code&gt; according to the protocol and wakes up the read request thread of &lt;code&gt;go-fuse&lt;/code&gt; to wait for the processing result.  &lt;/li&gt;
&lt;li&gt;The user-space &lt;code&gt;go-fuse&lt;/code&gt; module reads the FUSE request and calls the corresponding implementation of JuiceFS after parsing the request.  &lt;/li&gt;
&lt;li&gt;&lt;code&gt;go-fuse&lt;/code&gt; writes the processing result of this request into &lt;code&gt;mount fd&lt;/code&gt;, that is, into the FUSE result queue, and then wakes up the application waiting thread.  &lt;/li&gt;
&lt;li&gt;The application thread is awakened, gets the processing result of this request, and then returns to the upper layer.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Due to the frequent switching between user space and kernel space required by FUSE, many people have doubts about its performance. In fact, this is not entirely the case. We conducted a set of tests using JuiceFS.&lt;/p&gt;
&lt;p&gt;The testing environment: 1.5 TB RAM, an Intel Xeon 176-core machine, a 512 GB sparse file on JuiceFS&lt;/p&gt;
&lt;p&gt;We used fio to perform sequential read tests on it:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Mount parameters: &lt;code&gt;./cmd/mount/mount mount --no-update --conf-dir=/root/jfs/deploy/docker --cache-dir /tmp/jfsCache0 --enable-xattr --enable-acl -o allow_other test-volume /tmp/jfs&lt;/code&gt;  &lt;/li&gt;
&lt;li&gt;The fio command: &lt;code&gt;fio --name=seq_read --filename=/tmp/jfs/holefile  --rw=read --bs=4M  --numjobs=1  --runtime=60 --time_based --group_reporting&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This excluded the constraints of hardware disks and tested the limit bandwidth of the FUSE file system.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;all-memory&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;JuiceFS sequential read throughput in all-memory&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;Testing results:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The single-thread bandwidth under a single mount point reached 2.4 GiB/s.  &lt;/li&gt;
&lt;li&gt;As the number of threads increased, the bandwidth could grow linearly. At 20 threads, it reached 25.1 GiB/s. This throughput already meets most actual application scenarios.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In terms of the use of FUSE, JuiceFS has implemented the smooth upgrade feature. By ensuring the consistency of &lt;code&gt;mount fd&lt;/code&gt;, users can upgrade the JuiceFS version or modify the mount parameters without re-mounting the file system and interrupting the application. For details, see &lt;a href="https://juicefs.com/en/blog/engineering/smooth-upgrade"&gt;Smooth Upgrade: Implementation and Usage&lt;/a&gt;.  &lt;/p&gt;
&lt;p&gt;FUSE also has some limitations. For example, processes accessing the FUSE device require high permissions, especially in container environments, usually requiring privileged mode to be enabled. In addition, containers are usually transient and stateless. If a container exits unexpectedly and data is not written to disk in time, there is a risk of data loss.  &lt;/p&gt;
&lt;p&gt;Therefore, for Kubernetes scenarios, the &lt;a href="https://juicefs.com/docs/csi/introduction/"&gt;JuiceFS CSI Driver&lt;/a&gt; allows applications to access the JuiceFS file system with non-privileged containers. The CSI driver manages the lifecycle of the FUSE process to ensure that data can be written to disk in time and will not be lost. For details, see &lt;a href="https://juicefs.com/en/blog/usage-tips/kubernetes-data-persistence-juicefs"&gt;K8s Data Persistence: Getting Started with JuiceFS CSI Driver&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;FUSE decouples user space from kernel space, providing developers with great flexibility and convenience in implementing file systems in user space. Especially in modern computing environments such as cloud computing and distributed storage, FUSE makes building and maintaining complex storage systems more efficient, customizable, and easy to expand. &lt;/p&gt;
&lt;p&gt;JuiceFS is based on FUSE and implements a high-performance distributed file system in user space. In the future, we’ll continue exploring optimization methods for FUSE and continuously improving the performance and reliability of file systems to meet increasingly complex storage needs and provide users with stronger data management capabilities.&lt;/p&gt;
&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="https://go.juicefs.com/slack/"&gt;community on Slack&lt;/a&gt;.&lt;/p&gt;&lt;/div&gt;</description><pubDate>Sat, 14 Feb 2026 07:27:00 +0000</pubDate><guid>https://www.juicefs.com/en/blog/engineering/design-fuse-kernel-user-space</guid></item><item><title>JuiceFS Enterprise 5.3: 500B+ Files per File System &amp; RDMA Support</title><link>https://www.juicefs.com/en/blog/release-notes/juicefs-enterprise-5-3-rdma-support</link><description>&lt;div class="block-markdown"&gt;&lt;p&gt;&lt;a href="https://juicefs.com/docs/cloud/"&gt;JuiceFS Enterprise Edition&lt;/a&gt; 5.3 has recently been released, achieving a milestone breakthrough by &lt;strong&gt;supporting over 500 billion files in a single file system&lt;/strong&gt;. This upgrade includes several key optimizations to the metadata multi-zone architecture and introduces remote direct memory access (RDMA) technology for the first time to enhance distributed caching efficiency. In addition, version 5.3 enhances write support for mirrors and provides data caching for objects imported across buckets. It aims to support high-performance requirements and multi-cloud application scenarios.  &lt;/p&gt;
&lt;p&gt;JuiceFS Enterprise Edition is designed for high-performance scenarios. Since 2019, it has been applied in machine learning and has become one of the core infrastructures in the AI industry. Its customers include large language model (LLM) companies such as &lt;a href="https://juicefs.com/en/blog/user-stories/minimax-foundation-model-ai-storage"&gt;MiniMax&lt;/a&gt; and &lt;a href="https://juicefs.com/en/blog/user-stories/artificial-intelligence-storage-large-language-model-multimodal"&gt;StepFun&lt;/a&gt;; AI infrastructure and applications like &lt;a href="https://fal.ai/"&gt;fal&lt;/a&gt; and &lt;a href="https://www.google.com/aclk?sa=L&amp;amp;pf=1&amp;amp;ai=DChsSEwiRiciyqLKSAxUDGnsHHWLjLNUYACICCAEQABoCdG0&amp;amp;co=1&amp;amp;ase=2&amp;amp;gclid=Cj0KCQiAp-zLBhDkARIsABcYc6uhhwJ9tC5nV4bZaVnUn0rp3n5supJemQ56IlmqotNNBXwOu7nj45YaAoHLEALw_wcB&amp;amp;cid=CAASWeRoj18s_qlU1Snul0wlY3LceuDdnBzqF0JeaQvpy0BRPWlYRSWMNuJeTIC5wRaiAQ_Y5fSwz0TyEorzHk_5RryIFwySfWj-3W4JosAnIYhsmmi-OBv4bDQr&amp;amp;cce=2&amp;amp;category=acrcp_v1_32&amp;amp;sig=AOD64_2ED5eS2SjlfQ6Or__-vtFR2mN9Aw&amp;amp;q&amp;amp;nis=4&amp;amp;adurl=https://www.heygen.com/?sid%3Drewardful%26via%3D8866%26gad_source%3D1%26gad_campaignid%3D23447523769%26gbraid%3D0AAAAA-C7PAeKAlXawSoqSIYo1bVn-PoAD%26gclid%3DCj0KCQiAp-zLBhDkARIsABcYc6uhhwJ9tC5nV4bZaVnUn0rp3n5supJemQ56IlmqotNNBXwOu7nj45YaAoHLEALw_wcB&amp;amp;ved=2ahUKEwitwMCyqLKSAxVKcfUHHRrvBfUQ0Qx6BAgMEAE"&gt;HeyGen&lt;/a&gt;; autonomous driving companies like &lt;a href="https://www.momenta.cn/"&gt;Momenta&lt;/a&gt; and &lt;a href="https://en.horizon.auto/"&gt;Horizon Robotics&lt;/a&gt;; and numerous leading technology enterprises across various industries leveraging AI.&lt;/p&gt;
&lt;h2&gt;Single file system supports 500 billion+ files&lt;/h2&gt;
&lt;p&gt;The multi-zone architecture is one of JuiceFS' key technologies for handling hundreds of billions of files, ensuring high scalability and concurrent processing capabilities. &lt;strong&gt;To meet the growing demands of scenarios like &lt;a href="https://en.wikipedia.org/wiki/Self-driving_car"&gt;autonomous driving&lt;/a&gt;, version 5.3 introduces in-depth optimizations to the multi-zone architecture, increasing the zone limit to 1,024 and enabling a single file system to store and access at least 500 billion files&lt;/strong&gt; (each zone can store 500 million files, with a maximum of 2 billion).  &lt;/p&gt;
&lt;p&gt;The figure below shows JuiceFS Enterprise Edition architecture, with a single zone in the lower left corner:&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;企业版架构&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;JuiceFS Enterprise Edition&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;This breakthrough presents exponentially increasing challenges in system performance, data consistency, and stability, backed by a series of complex underlying optimizations and R&amp;amp;D efforts.&lt;/p&gt;
&lt;h3&gt;Cross-zone hotspot balancing: automated monitoring and hotspot migration, with manual ops tools&lt;/h3&gt;
&lt;p&gt;In distributed systems, hotspots are a common challenge. Especially when data is distributed across multiple zones, some zones may experience higher loads than others. This leads to imbalance that impacts system performance.  &lt;/p&gt;
&lt;p&gt;When the number of zones reaches hundreds, hotspot issues become more prevalent. Particularly with smaller datasets and larger numbers of files, read/write hotspots exacerbate latency fluctuations.  &lt;/p&gt;
&lt;p&gt;We introduced an automated hotspot migration mechanism to move frequently accessed files to other zones, distributing the load and reducing pressure on specific zones. However, in practice, relying solely on automated migration cannot fully resolve all issues. In certain special or extreme scenarios, automated tools may not respond promptly. &lt;strong&gt;Therefore, alongside automated monitoring and migration, we added manual operational tools, allowing administrators to intervene in complex scenarios, perform manual analysis, and implement optimization solutions.&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;Large-scale migration: improved migration speed, small-batch concurrent migration&lt;/h3&gt;
&lt;p&gt;Facing zones with excessive hotspots, early migration operations were simple. However, as the system scale expanded, migration efficiency gradually decreased. &lt;strong&gt;To address this, we introduced a small-batch concurrent migration strategy&lt;/strong&gt;, breaking down high-access directories into smaller chunks and migrating them in parallel to multiple lower-load zones. This quickly scatters hotspots and restores normal application access.&lt;/p&gt;
&lt;h3&gt;Enhanced reliability self-checks: automatic repair and cleanup of intermediate migration states&lt;/h3&gt;
&lt;p&gt;In large-scale clusters, the probability of distributed transaction failures increases significantly, especially during extensive migration processes. To address this, &lt;strong&gt;we enhanced reliability detection mechanisms, adding periodic background checks to scan cross-zone file states, particularly focusing on intermediate state issues, and automatically performing repairs and cleanup&lt;/strong&gt;.  &lt;/p&gt;
&lt;p&gt;Previously, the system encountered issues with residual intermediate state data. While these did not affect operations in the short term, over time they could lead to errors. Through enhanced self-check mechanisms, we ensure the background periodically scans and promptly handles intermediate state issues, improving system stability and reliability.  &lt;/p&gt;
&lt;p&gt;Beyond the three key optimizations above, we also made multiple improvements to the console to better adapt to managing more zones. We optimized concurrent processing, operational tasks, and query displays, enhancing overall performance and user experience. Specifically, we refined UI design to better showcase system states in large-scale zone environments.&lt;/p&gt;
&lt;h3&gt;Performance stress test for hundreds of billions of files&lt;/h3&gt;
&lt;p&gt;We conducted large-scale tests using a custom &lt;a href="https://github.com/llnl/mdtest"&gt;mdtest&lt;/a&gt; tool on Google Cloud, deploying 60 nodes, each with over 1 TB of memory. In terms of software configuration, we increased the number of zones to 1,024. The deployment method was similar to previous setups, but to reduce memory consumption, we deployed only one service process, with two others as cold backups.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;压测&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;JuiceFS Enterprise Edition 5.3 test&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;JuiceFS Enterprise Edition 5.3 test:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Test duration: Approximately 20 hours  &lt;/li&gt;
&lt;li&gt;Total files written: About 400 billion files  &lt;/li&gt;
&lt;li&gt;Write speed: 5 million files per second  &lt;/li&gt;
&lt;li&gt;Memory usage: About 35% to 40%  &lt;/li&gt;
&lt;li&gt;Disk usage: 40% to 50%, primarily for metadata persistence, with good utilization&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Based on our experience, if using a configuration with one service process, one hot backup, and one cold backup, memory usage increases by 20% to 30%.  &lt;/p&gt;
&lt;p&gt;Due to limited cloud resources, this test only wrote up to 400 billion files. During stress testing, the system performed stably, with hardware resources still remaining. We’ll continue to attempt larger-scale tests in the future.&lt;/p&gt;
&lt;h2&gt;Support for RDMA: increased bandwidth cap, reduced CPU usage&lt;/h2&gt;
&lt;p&gt;This new version introduces support for &lt;a href="https://en.wikipedia.org/wiki/Remote_direct_memory_access"&gt;RDMA&lt;/a&gt; technology for the first time. Its basic architecture is shown in the diagram below. RDMA allows direct access to remote node memory, bypassing the operating system's network protocol stack. This significantly improves data transfer efficiency.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;rdma&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;RDMA principle architecture&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;The main advantages of RDMA include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Low latency:&lt;/strong&gt; By enabling direct memory-to-memory transfers and bypassing the OS network protocol layers, it reduces CPU interrupts and context switches. This lowers latency.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;High throughput:&lt;/strong&gt; RDMA uses hardware for direct data transfer, better utilizing the bandwidth of network interface cards (NICs).  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reduced CPU usage:&lt;/strong&gt; In RDMA, data copying is almost entirely handled by the NIC, with the CPU only processing control messages. This allows the NIC to handle hardware transfers, freeing up CPU resources.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In JuiceFS, network request messages between clients and metadata services are small, and existing TCP configurations already meet the needs. However, in &lt;a href="https://en.wikipedia.org/wiki/Distributed_cache"&gt;distributed caching&lt;/a&gt;, file data is transferred between clients and cache nodes. Using RDMA can effectively improve transfer efficiency and reduce CPU consumption.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;网络对比&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;CPU usage comparison: TCP vs. RDMA&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;We conducted 1 MB random read tests using 160 Gbps NICs, comparing versions 5.1, 5.2 (using TCP networking) with version 5.3 (RDMA), and observed CPU usage.  &lt;/p&gt;
&lt;p&gt;Tests showed that RDMA effectively reduces CPU usage:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In version 5.2, CPU usage was nearly 50%.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;In version 5.3, with RDMA optimization, CPU usage dropped to about one-third. Client and cache node CPU usage decreased to 8 cores and 5 cores respectively, with bandwidth reaching 20 GiB/s.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In previous tests, we found that while TCP ran stably on 200G NICs, fully saturating bandwidth was challenging, typically achieving only 85%-90% utilization. &lt;strong&gt;For customers requiring higher bandwidth (such as 400G NICs), TCP could not meet demands. However, RDMA can more easily reach hardware bandwidth limits, providing better transfer efficiency.&lt;/strong&gt;  &lt;/p&gt;
&lt;p&gt;If customers have RDMA-capable hardware and high bandwidth requirements (for example, NICs greater than 100G) and wish to reduce CPU usage, RDMA is a technology worth trying. Currently, our RDMA feature is in public testing and has not yet been widely deployed in production environments.&lt;/p&gt;
&lt;h2&gt;Enhanced write support for mirrors&lt;/h2&gt;
&lt;p&gt;Initially, &lt;a href="https://juicefs.com/docs/cloud/guide/mirror/"&gt;mirror&lt;/a&gt; clusters were primarily used for read-only mirroring in enterprise products. As users requested capabilities like writing temporary files (such as training data) in mirrors, we provided write support for mirrors.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;企业版镜像系统&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;JuiceFS Enterprise Edition’s mirror file system architecture&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;The mirror client implements a read-write separation mechanism. When reading data, the client prioritizes fetching from the mirror cluster to reduce latency. When writing data, it still writes to the source cluster to ensure data consistency. By recording and comparing metadata version numbers, we ensure strong consistency between the mirror client and source cluster client views of the data.  &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;To improve availability, version 5.3 introduces a fallback mechanism. When the mirror becomes unavailable, client read requests automatically fall back to the source cluster.&lt;/strong&gt; This ensures application continuity and avoids interruptions caused by mirror cluster failures. We also optimized deployments in multi-mirror environments. Previously, the mirror end required two hot backup nodes to ensure high availability. Now, with the improved fallback feature, deploying a single mirror node can achieve similar effects. This ensures application continuity and reduces costs, especially beneficial for users requiring multiple mirrors.  &lt;/p&gt;
&lt;p&gt;Through this improvement, we not only reduced hardware costs but also found a balance between high availability and low cost. For users deploying mirrors in multiple locations, reducing metadata replicas further lowers overall costs.&lt;/p&gt;
&lt;h2&gt;Simplified operations &amp;amp; increased flexibility: providing cross-bucket data cache for imported objects&lt;/h2&gt;
&lt;p&gt;In JuiceFS, users can use the &lt;code&gt;import&lt;/code&gt; command to bring existing files from &lt;a href="https://en.wikipedia.org/wiki/Object_storage"&gt;object storage&lt;/a&gt; under unified management. This is convenient for users already storing large amounts of data (for example, tens of petabytes). However, in previous versions, this feature only supported caching for objects within the same data bucket. This meant imported objects had to reside in the same bucket as the existing file system data. This limitation had certain practical constraints.  &lt;/p&gt;
&lt;p&gt;In version 5.3, we improved this feature. &lt;strong&gt;Users can now provide caching capability for any imported objects, regardless of whether they come from the same data bucket.&lt;/strong&gt; This allows users more flexibility in managing objects across different data buckets, avoiding strict bucket restrictions and enhancing data management freedom.  &lt;/p&gt;
&lt;p&gt;In addition, previously, if users had data distributed across multiple buckets and wanted to provide caching for that data, they needed to create a new file system for each bucket. In version 5.3, users only need to create one file system (volume) to uniformly manage data from multiple buckets and provide caching for all buckets.&lt;/p&gt;
&lt;h2&gt;Other important optimizations&lt;/h2&gt;
&lt;h3&gt;Trace feature&lt;/h3&gt;
&lt;p&gt;We added the trace feature, a feature provided by the Go language itself. Through this, advanced users can perform tracing and performance analysis, gaining more information to help quickly locate issues.&lt;/p&gt;
&lt;h3&gt;Trash recovery&lt;/h3&gt;
&lt;p&gt;In previous versions, especially with multiple zones, sometimes the paths recorded in the trash were incomplete. This led to anomalies during recovery, where files were not restored to the expected locations. To address this, in version 5.3, when deleting files, we record the original file path, ensuring more reliable recovery capabilities.&lt;/p&gt;
&lt;h3&gt;Python SDK improvements&lt;/h3&gt;
&lt;p&gt;In earlier versions, we released the &lt;a href="https://juicefs.com/docs/cloud/deployment/python-sdk/"&gt;Python SDK&lt;/a&gt;, providing basic read/write functionalities for Python users to interface with our system. In version 5.3, we not only strengthened basic read/write functions but also added support for operational subcommands. For example, users can directly call commands like &lt;code&gt;juicefs info&lt;/code&gt; or &lt;code&gt;warmup&lt;/code&gt; via the SDK without relying on external system commands. This simplifies coding efforts and avoids potential performance bottlenecks from frequently calling external commands.&lt;/p&gt;
&lt;h3&gt;The Windows client&lt;/h3&gt;
&lt;p&gt;We previously launched a beta version of the Windows client and have received some user feedback. After improvements, the current version shows significant enhancements in mount reliability, performance, and compatibility with Linux systems. In the future, we plan to further refine the Windows client, providing an experience closer to Linux for users reliant on Windows.&lt;/p&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;p&gt;Compared to expensive dedicated hardware, JuiceFS helps users balance performance and cost when addressing data growth by flexibly utilizing cloud or existing customer storage resources. In version 5.3, by optimizing the metadata zone architecture, a single file system can support over 500 billion files. The first-time introduction of RDMA technology significantly improves distributed caching bandwidth and data access efficiency, reduces CPU usage, and further optimizes system performance. In addition, we enhanced features like write support for mirrors and caching, improving the performance and operational efficiency of large-scale clusters and optimizing user experience.  &lt;/p&gt;
&lt;p&gt;&lt;a href="https://juicefs.com/docs/cloud/"&gt;Cloud service&lt;/a&gt; users can now directly experience JuiceFS Enterprise Edition 5.3 online, while on-premises deployment users can obtain upgrade support through official channels. We’ll continue to focus on high-performance storage solutions, partnering with enterprises to tackle challenges brought by continuous data growth.  &lt;/p&gt;
&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="https://go.juicefs.com/slack/"&gt;community on Slack&lt;/a&gt;.&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 04 Feb 2026 02:59:53 +0000</pubDate><guid>https://www.juicefs.com/en/blog/release-notes/juicefs-enterprise-5-3-rdma-support</guid></item><item><title>How Just Two Cache Nodes Achieved 1.45 TB/s Throughput</title><link>https://www.juicefs.com/en/blog/solutions/cache-nodes-support-high-throughput</link><description>&lt;div class="block-markdown"&gt;&lt;p&gt;As application demands for large-scale concurrent reads and data distribution increase, such as in film and television rendering scenarios, traditional storage solutions like NAS often require significant investment in additional cache resources when the number of concurrent clients grows. To improve response times, data warm-up is also typically necessary. This not only incurs extra time overhead but also further increases resource strain.  &lt;/p&gt;
&lt;p&gt;&lt;a href="https://juicefs.com/docs/community/introduction/"&gt;JuiceFS&lt;/a&gt;, a distributed file system built on &lt;a href="https://en.wikipedia.org/wiki/Object_storage"&gt;object storage&lt;/a&gt;, uses its high-performance architecture to aggregate throughput and reduce latency via distributed caching. It provides efficient support for large-scale concurrent client reads.  &lt;/p&gt;
&lt;p&gt;In this article, we’ll share a recent real-world test case, demonstrating how we successfully aggregated 1.45 TB/s of bandwidth using 4,000 application nodes and, in the process, ensured system stability by introducing a two-level cache pool configured with only two independent cache nodes.  &lt;/p&gt;
&lt;p&gt;Through this article, we aim to provide a practical solution for storage bottlenecks in high-concurrency, high-throughput scenarios and hope it sparks further discussion and exploration of storage optimization methods.&lt;/p&gt;
&lt;h2&gt;Traditional NAS: more nodes, slower storage&lt;/h2&gt;
&lt;p&gt;Our customer in this case was a film and television render farm user, where thousands of Windows nodes are launched simultaneously for daily rendering jobs.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Application characteristics:&lt;/strong&gt; Each node needs to read a batch of files (mostly reusable assets) to its local storage during rendering.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Original pain points:&lt;/strong&gt; When using the original public cloud NAS storage (mounted via SMB), increasing the number of nodes forced continuous addition of backend SMB service nodes to handle the surge in traffic and IOPS. This led to a steep rise in management complexity and cost. When concurrent nodes exceeded 1,000, the storage system often became overwhelmed.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Critical requirement:&lt;/strong&gt; An urgent need for a capability beyond the storage foundation to shoulder the pressure of application throughput.&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;Multi-node NAS architecture&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;Multi-node NAS architecture&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;In the &lt;a href="https://en.wikipedia.org/wiki/Server_Message_Block"&gt;SMB&lt;/a&gt; service model, each client reads the full data volume from the SMB storage. This results in sustained high traffic at the server side. Administrators must continuously monitor the health of the SMB services. Once the load approaches maximum capacity, they need to promptly scale out, providing storage capability that matches the cluster size. This significantly increases operational pressure.&lt;/p&gt;
&lt;h2&gt;Utilizing idle resources: using numerous application nodes as distributed cache&lt;/h2&gt;
&lt;p&gt;JuiceFS released a new Windows client after &lt;a href="https://juicefs.com/docs/community/introduction"&gt;Community Edition&lt;/a&gt; 1.3 and &lt;a href="https://juicefs.com/docs/cloud/"&gt;Enterprise Edition&lt;/a&gt; 5.2. It supports the mounting of the file system as a local drive via a &lt;code&gt;.exe&lt;/code&gt; process, with usage similar to Linux.&lt;/p&gt;
&lt;p&gt;However, in scenarios with massive numbers of clients, simply switching the application to the standard JuiceFS mount point -&amp;gt; distributed cache -&amp;gt; object storage chain could concentrate traffic on the independent cache layer. This potentially creates a new performance bottleneck. Instead of continuously scaling dedicated cache nodes, a shift in perspective is more effective: use the idle bandwidth and disk space of the vast number of application nodes, pooling them into a massive distributed cache pool.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;JuiceFS distributed cache mode (&lt;a href="https://en.wikipedia.org/wiki/Peer-to-peer"&gt;P2P&lt;/a&gt; mode):&lt;/strong&gt; A file only needs to be read once within the cluster; subsequent requests from other nodes fetch it directly from the neighboring nodes in the P2P cache pool.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Object storage side:&lt;/strong&gt; The back-to-source traffic is extremely low. After the initial cold read of a file, subsequent traffic is almost entirely handled by the cache pool.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Resource requirements:&lt;/strong&gt; No dedicated cache hardware is needed, only requiring each application node to contribute a portion of its disk and bandwidth.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this solution, we did not configure a single independent cache node. All application nodes act as both consumers and providers (P2P mode).&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;JuiceFS distributed cache&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;JuiceFS distributed cache deployment architecture&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;h2&gt;Case study: 4,000 app nodes aggregate to 1.45 TB/s throughput&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Test task:&lt;/strong&gt; Each Windows node, without any warm-up (cold read), read 16 large files of 2 GB each. The total time for all nodes to finish reading was measured, observing the variance in time per node and checking for long-tail effects.  &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Configuration strategy:&lt;/strong&gt; The 4,000 nodes were divided into multiple subgroups (500 nodes per group). The 16 data blocks of 2 GB each were distributed across nodes within a group using hashing to avoid all nodes simultaneously requesting data from the object storage. This would cause congestion.  &lt;/p&gt;
&lt;p&gt;The cold read process for the JuiceFS client:  &lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The Windows client read the 16 files of 2 GB each. Using a consistent hashing topology, it located the corresponding nodes for these data blocks within its 500-node cache group and sent requests to them.  &lt;/li&gt;
&lt;li&gt;Upon receiving a data block request, the cache node found a local cache miss (cold read), so it fetched the data block from the object storage. After retrieval, it returned the data to the client and wrote it to its local cache for reuse by subsequent requests.  &lt;/li&gt;
&lt;li&gt;Once the client had retrieved complete data blocks from all cache service nodes, the test ended. The time distribution (maximum, minimum) for all clients to finish reading was compiled to evaluate variance and long-tail situations.  &lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The results for reading this batch of 16*2 GB files with different numbers of client nodes are as follows:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style="text-align: left;"&gt;Number of clients&lt;/th&gt;
&lt;th style="text-align: left;"&gt;Peak aggregated throughput&lt;/th&gt;
&lt;th style="text-align: left;"&gt;Total time (range/average)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;2,000&lt;/td&gt;
&lt;td style="text-align: left;"&gt;729 GB/s&lt;/td&gt;
&lt;td style="text-align: left;"&gt;92s ~ 136s / Avg 107s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;2,500&lt;/td&gt;
&lt;td style="text-align: left;"&gt;921 GB/s&lt;/td&gt;
&lt;td style="text-align: left;"&gt;87s ~ 109s / Avg 98s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;3,000&lt;/td&gt;
&lt;td style="text-align: left;"&gt;1.11 TB/s&lt;/td&gt;
&lt;td style="text-align: left;"&gt;93s ~ 121s / Avg 106s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;3,500&lt;/td&gt;
&lt;td style="text-align: left;"&gt;1.34 TB/s&lt;/td&gt;
&lt;td style="text-align: left;"&gt;89s ~ 112s / Avg 100s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;4,000&lt;/td&gt;
&lt;td style="text-align: left;"&gt;1.45 TB/s&lt;/td&gt;
&lt;td style="text-align: left;"&gt;92s ~ 115s / Avg 101s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The figure below shows aggregated throughput performance of JuiceFS with different client counts:&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;I:O aggregation performance&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;I/O aggregation performance&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;The results met expectations in all aspects:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Stability: Whether with 2,000 or 4,000 nodes, the total time to read the data remained stable at around 100 seconds.&lt;/strong&gt;  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalability: The 4,000 nodes successfully aggregated an ultra-high bandwidth of 1.45 TB/s.&lt;/strong&gt; Theoretically, within the limits of metadata capacity, this architecture can achieve continuous horizontal scaling, potentially supporting cache node clusters at the scale of tens of thousands.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Reference mount parameters for application nodes:&lt;/p&gt;
&lt;pre class="codehilite"&gt;&lt;code class="language-bash"&gt;juicefs.exe mount juice-fs X: --cache-group=primary --buffer-size=4096 --enable-kernel-cache --as-root --subgroups=8  
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Thus, without using a single independent cache service node, we aggregated 1.45 TB/s of read capability using the customer’s application nodes. By offloading the vast majority of traffic from the object storage to this distributed cache layer formed by client nodes, we alleviated the burden on the underlying storage at zero additional hardware cost.  &lt;/p&gt;
&lt;p&gt;In actual application scenarios, such extremely high throughput might not always be achievable, as cache efficiency is often related to data duplication. In practice, the files read by each node are not entirely the same. Nonetheless, this solution is an effective method for storage scaling. Even a partial improvement can yield significant benefits with almost no additional cost.&lt;/p&gt;
&lt;h2&gt;Enhanced stability: two-level distributed caching&lt;/h2&gt;
&lt;p&gt;While the caching effect was impressive, the customer expressed concerns about system stability. For example, in some scenarios, application nodes might be destroyed immediately after completing their tasks, and these application nodes also served as cache nodes. When a large number of application nodes went offline suddenly, the cache stored on these nodes was also lost. This could cause a massive surge of traffic to fall back to the object storage, turning it into a bottleneck and affecting overall stability. To address caching performance issues caused by application node volatility, we proposed a two-level distributed caching solution, as shown in the architecture diagram below:&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;JuiceFS two-level&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;JuiceFS two-level cache architecture&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;The second-level (L2) cache pool sits between the first-level (L1) cache pool and the object storage. On an L1 cache miss, data is first attempted to be retrieved from L2. If it's also a miss in L2, it falls back to the object storage. This effectively mitigates the impact of L1 cache node churn. Since L2 only handles the fallback traffic from L1 misses (including cold reads and warm-up), its capacity and performance planning only need to cover the available throughput of the object storage side. In this test, configuring just two independent cache nodes as L2 was sufficient to meet the demand.  &lt;/p&gt;
&lt;p&gt;With the addition of the L2 cache pool, the read process changes as follows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The Windows client reads the 16 files of 2 GB each. It determines the specific node for the data block within its L1 cache group via the consistent hashing topology and simultaneously requests data from the L2 cache group.  &lt;/li&gt;
&lt;li&gt;Due to a cold read, there is no data in the L2 cache group. The L2 cache node fetches it from the object storage and fills the L2 cache pool.  &lt;/li&gt;
&lt;li&gt;The data block is returned from the L2 cache pool to the L1 cache pool for population, and then distributed P2P within the L1 cache pool.  &lt;/li&gt;
&lt;li&gt;At this point, most traffic is concentrated within the L1 cache pool. L2 only handles the minimal traffic falling back to object storage. Therefore, even with only a few L2 nodes, they do not become a performance bottleneck. The role of the L2 cache pool is to act as a low-latency local substitute for the object storage.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Even if a large number of L1 application nodes go offline, causing the cache topology to change and requiring data blocks to be re-downloaded. Data can still be fetched from the nearby L2 cache pool. As long as the proportion of nodes going offline is controlled reasonably, application operations are barely affected.&lt;/p&gt;
&lt;p&gt;L2 cache group mount parameters:&lt;/p&gt;
&lt;pre class="codehilite"&gt;&lt;code class="language-bash"&gt;juicefs mount juice-fs /jfs-cache --cache-group=secondary --cache-size=-1 --cache-dir=/data* --free-space-ratio=0.01 --buffer-size=10240 --max-downloads=400  
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;L1 (application node) mount parameters:&lt;/p&gt;
&lt;pre class="codehilite"&gt;&lt;code class="language-bash"&gt;juicefs.exe mount juice-fs X: --cache-group=primary --second-group=secondary --buffer-size=4096 --enable-kernel-cache --as-root --subgroups=8  
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Furthermore, the two-level distributed caching is highly suitable for scenarios requiring the reuse of existing cache pools.  &lt;/p&gt;
&lt;p&gt;For example, consider a batch of cache-pool applications located in Seattle, with a total capacity of 2 PiB, named &lt;code&gt;cache-group-st&lt;/code&gt;. Suddenly, applications in Chicago also need to use the same data, which is almost the same as the Seattle data.  &lt;/p&gt;
&lt;p&gt;Instead of warming up the 2 PiB of data from Seattle’s object storage, we can configure the Chicago cache group with &lt;code&gt;--second-group=cache-group-st&lt;/code&gt;. When Chicago application requests data, it prioritizes reading from the Seattle cache pool over a dedicated line, achieving very fast and stable speeds (within 2 ms latency). This eliminates the complex process of repeated data warm-up, allowing the Chicago applications to launch directly. This is extremely convenient.&lt;/p&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Through this extreme stress test with 4,000 nodes, we successfully transformed the large-scale idle resources of a compute cluster into a storage pool with up to 1.45 TB/s of throughput.&lt;/strong&gt; The introduction of a secondary cache effectively addressed "last-mile" stability concerns. By employing JuiceFS' storage software architecture, the potential of client clusters can be fully unlocked, achieving significant performance improvements without increasing additional hardware costs.  &lt;/p&gt;
&lt;p&gt;This solution is applicable to scenarios such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;High-concurrency repeated read scenarios:&lt;/strong&gt; Such as &lt;a href="https://www.ibm.com/think/topics/model-training"&gt;model training&lt;/a&gt;/inference data fetching, container image distribution, and film and television rendering. The more nodes, the greater the P2P cache benefit.  &lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Elastic computing scenarios:&lt;/strong&gt; Where application nodes frequently scale in and out on a large scale (such as spot instances). Using a two-level cache architecture ensures continuity and stability of data access. &lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Hybrid cloud / multi-cloud architectures:&lt;/strong&gt; Leveraging the secondary caching mechanism allows for the reuse of cache pool resources across different regions, minimizing object storage calls and transfer costs associated with repeated warm-up.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="https://go.juicefs.com/slack/"&gt;community on Slack&lt;/a&gt;.&lt;/p&gt;&lt;/div&gt;</description><pubDate>Thu, 29 Jan 2026 12:45:00 +0000</pubDate><guid>https://www.juicefs.com/en/blog/solutions/cache-nodes-support-high-throughput</guid></item><item><title>Juicedata Joins the Agentic AI Foundation as a Silver Member</title><link>https://www.juicefs.com/en/blog/company/juicedata-join-agentic-ai-foundation</link><description>&lt;div class="block-markdown"&gt;&lt;p&gt;Juicedata is pleased to announce that we have joined the &lt;strong&gt;Agentic AI Foundation (AAIF)&lt;/strong&gt; as a &lt;strong&gt;Silver Member&lt;/strong&gt;. We are excited to collaborate with a growing global community to advance open, interoperable, and production-ready foundations for the agentic AI era.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;What is AAIF?&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;AAIF is a Linux Foundation–hosted initiative created to provide &lt;strong&gt;vendor-neutral governance&lt;/strong&gt; and a shared home for collaboration on &lt;strong&gt;open standards, protocols, and projects&lt;/strong&gt; that enable agentic AI systems to work reliably across environments and vendors.&lt;/p&gt;
&lt;p&gt;AAIF is anchored by several widely discussed building blocks for agentic systems, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt; – a standard approach for connecting models/agents to tools and context in a consistent way  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;goose&lt;/strong&gt; – an open project focused on agentic runtime workflows   &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AGENTS.md&lt;/strong&gt; – a lightweight, practical standard intended to improve how agents interact with codebases and developer workflows&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;strong&gt;Why Juicedata joined AAIF: the file system is critical infrastructure&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Agentic AI shifts systems from “single prompt → single response” into &lt;strong&gt;continuous, tool-using, multi-step execution&lt;/strong&gt;. That evolution raises the bar for the data layer. In practice, nearly every production-grade agentic system depends on a modern AI data pipeline that must handle:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Data processing and feature generation&lt;/strong&gt;: massive parallel reads/writes, high metadata churn, and mixed file sizes  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pre-training&lt;/strong&gt;: high-throughput sequential scans, streaming datasets, and large-scale checkpointing  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Post-training (alignment, SFT, RLHF/RLAIF)&lt;/strong&gt;: frequent dataset versioning, sampling, and experiment tracking  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Inference and model distribution&lt;/strong&gt;: fast model artifact delivery, cold-start mitigation, and predictable loading  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multi-region deployment&lt;/strong&gt;: mirrored datasets and model artifacts, replication workflows, and consistency guarantees across regions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words: &lt;strong&gt;the file system is on the critical path&lt;/strong&gt; for agentic applications and modern AI pipelines. If storage is slow, inconsistent, or operationally fragile, the entire agentic stack becomes unreliable—regardless of how good the model or agent framework is.&lt;/p&gt;
&lt;p&gt;Juicedata joined AAIF because we believe the agentic ecosystem needs not only protocols and agent runtimes, but also a &lt;strong&gt;robust, open, and scalable data foundation&lt;/strong&gt; that works across clouds, regions, and heterogeneous compute.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Juicedata and JuiceFS: built for modern AI data pipelines&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Juicedata builds infrastructure software for data-intensive workloads. Our flagship product, &lt;strong&gt;JuiceFS&lt;/strong&gt;, is a cloud-native distributed file system designed to provide a &lt;strong&gt;unified namespace&lt;/strong&gt; and &lt;strong&gt;high-performance access&lt;/strong&gt; for massive-scale datasets across hybrid and multi-cloud environments.&lt;/p&gt;
&lt;p&gt;JuiceFS is commonly adopted where teams need:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;High-throughput read/write performance&lt;/strong&gt; with predictable latency  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Strong metadata capabilities&lt;/strong&gt; for billions of files and high-concurrency workloads  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Elastic scale&lt;/strong&gt; with object storage economics and cloud-native operations  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Operational simplicity&lt;/strong&gt; for heterogeneous compute (Kubernetes clusters, GPU fleets, autoscaling inference, etc.)  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data mobility&lt;/strong&gt; across regions and clouds, including replication and mirrored distribution patterns&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These capabilities map directly to the needs of agentic systems, which increasingly behave like always-on dataflow engines: continuously reading context, writing artifacts, caching intermediate results, and distributing models and datasets across dynamic compute.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Where JuiceFS is used today&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;We are proud that JuiceFS supports a broad range of AI and data-intensive scenarios in production. Organizations we work with include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Foundation Models&lt;/strong&gt;: Zhipu GLM, MiniMax  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI Applications&lt;/strong&gt;: HeyGen, fal.ai, Loveart, Gensmo, RunComfy, PixVerse  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;NeoCloud and MaaS&lt;/strong&gt;: Baseten, Cerebrium, GMICloud  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Autonomous Driving&lt;/strong&gt;: Momenta, Horizontal Robotics&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This diversity matters. Agentic systems are not confined to one “AI app” shape—foundation model builders, AI-native product teams, MaaS platforms, and autonomous driving pipelines all face different operational constraints, but they share a common requirement: &lt;strong&gt;fast, dependable data access at scale&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;How we plan to contribute to AAIF&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;By joining AAIF, Juicedata aims to be an active and constructive participant in the community. Specifically, we intend to collaborate on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Reference architectures&lt;/strong&gt; for agentic AI data foundations: best practices for dataset layout, caching strategy, and multi-region artifact distribution  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Operational patterns&lt;/strong&gt; for large-scale agentic deployments: reliability, observability, and performance tuning of the storage layer under agent-driven workloads  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ecosystem integrations&lt;/strong&gt;: improving the “plumbing” between agentic tooling (including emerging standards like MCP) and the data layer that agents depend on  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Community knowledge-sharing&lt;/strong&gt;: publishing benchmark methodologies, lessons learned, and production playbooks drawn from real-world workloads&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;AAIF’s emphasis on open governance and interoperability aligns with Juicedata’s belief that the agentic era will be won by ecosystems—not silos.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Looking ahead&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Agentic AI is moving quickly from experimentation to production. As that happens, infrastructure decisions that once looked “implementation-specific” become strategic. Data layout, replication, cache coherence, and model distribution are no longer secondary concerns—they are core determinants of product reliability and user experience.&lt;/p&gt;
&lt;p&gt;Juicedata joined AAIF to work closely with the global community and help ensure that &lt;strong&gt;the data foundation for agentic systems is open, scalable, and production-grade&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;We look forward to collaborating with fellow members and contributors—and to building the agentic AI era in the open.&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 21 Jan 2026 03:16:00 +0000</pubDate><guid>https://www.juicefs.com/en/blog/company/juicedata-join-agentic-ai-foundation</guid></item><item><title>From GlusterFS to JuiceFS: Lightillusions Achieved 2.5x Faster 3D AIGC Data Processing</title><link>https://www.juicefs.com/en/blog/user-stories/aigc-storage-glusterfs-cephfs-vs-juicefs</link><description>&lt;div class="block-markdown"&gt;&lt;p&gt;&lt;a href="https://www.lightillusions.com/"&gt;Lightillusions&lt;/a&gt; is a company specializing in spatial intelligence technology, integrating 3D vision, computer graphics, and generative models to build innovative 3D foundation models. Our company is led by Ping Tan, a professor at the Hong Kong University of Science and Technology (HKUST) and Director of the HKUST-BYD Joint Laboratory.  &lt;/p&gt;
&lt;p&gt;Unlike 2D models, a single 3D model can be several gigabytes in size, especially complex models like point clouds. When our data volume reached petabyte scales, management and storage became significant challenges. &lt;strong&gt;After trying solutions like NFS and GlusterFS, we chose &lt;a href="https://juicefs.com/docs/community/introduction/"&gt;JuiceFS&lt;/a&gt;, an open-source high-performance distributed file system, to build a unified storage platform.&lt;/strong&gt; This platform now serves multiple scenarios, supports cross-platform access including Windows and Linux, &lt;strong&gt;manages hundreds of millions of files, improves data processing speed by 200%–250%&lt;/strong&gt;, enables efficient storage scaling, and greatly simplifies operations and maintenance. This allows us to focus more on core research.  &lt;/p&gt;
&lt;p&gt;In this article, we’ll break down the unique storage demands of 3D AIGC, share why we selected JuiceFS over CephFS, and walk through the architecture of our JuiceFS-based storage platform.&lt;/p&gt;
&lt;h2&gt;Storage requirements for 3D AIGC&lt;/h2&gt;
&lt;p&gt;Our research focuses on perception and generation. In the 3D domain, task complexity is different from image and text processing. This placed higher demands on our AI models, algorithms, and infrastructure.  &lt;/p&gt;
&lt;p&gt;We illustrate the complexity of 3D data processing through a typical pipeline. On the left side of the diagram below is a 3D model containing texture (top-left) and geometry (bottom-right) information. First, we generate &lt;a href="https://en.wikipedia.org/wiki/Rendering_(computer_graphics)"&gt;rendered&lt;/a&gt; images. Each model has text labels describing its content, geometric features, and texture features, which are tightly coupled with the model. In addition, we process geometry data, such as sampled points and necessary numerical values obtained from data preprocessing, like signed distance fields (SDFs). It's important to note that 3D model file formats are highly diverse, and image formats are also different.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;3D data processing pipeline&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;3D data processing pipeline&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;Our work spans language models, image/video models, and 3D models. As data volume grows, so does the storage burden. The main characteristics of data usage in these scenarios are as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Language models: Data typically consists of a vast number of small files. Although individual text files are small, the total file count can reach millions or even tens of millions as data volume increases. This makes the management of such a large number of files a primary storage challenge.  &lt;/li&gt;
&lt;li&gt;Image and video data: High-resolution images and long videos are usually large. A single image can range from hundreds of kilobytes to several megabytes, while video files can reach gigabytes. During preprocessing—such as data augmentation, resolution adjustment, and frame extraction—data volume increases significantly. Especially in video processing, where each video is typically decomposed into a large number of image frames, managing these massive file collections adds considerable complexity.  &lt;/li&gt;
&lt;li&gt;3D models: Individual models, especially complex ones like point clouds, can be several gigabytes in size. &lt;strong&gt;3D data preprocessing is more complex than other data types, involving steps like texture mapping and geometry reconstruction, which consume great computational resources and can increase data volume.&lt;/strong&gt; Furthermore, 3D models often consist of multiple files, leading to a large total file count. As data grows, managing these files becomes increasingly difficult.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Based on the storage characteristics discussed above, when we chose a storage platform solution, we expected it to meet the following requirements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Diverse data formats and cross-node sharing:&lt;/strong&gt; Different models use different data formats, especially the complexity and cross-platform compatibility issues of 3D models. The storage system must support multiple formats and effectively manage data sharing across nodes and platforms.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Handling data models of different sizes:&lt;/strong&gt; Whether it's small files for language models, large-scale image/video data, or large files for 3D models, the storage system must be highly scalable to meet rapidly growing storage demands and handle the storage and access of large-size data efficiently.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Challenges of &lt;a href="https://www.virtana.com/glossary/what-is-cross-cloud/"&gt;cross-cloud&lt;/a&gt; and cluster storage:&lt;/strong&gt; As data volume increases, especially with petabyte-level storage needs for 3D models, cross-cloud and cluster storage issues become more prominent. The storage system must support seamless cross-region, cross-cloud data access and efficient cluster management.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Easy scaling:&lt;/strong&gt; The need for scaling is constant, whether for language, image/video, or 3D models, and is particularly high for 3D model storage and processing.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Simple operations and maintenance:&lt;/strong&gt; The storage system should provide easy-to-use management interfaces and tools. Especially for 3D model management, operational requirements are higher, making automated management and fault tolerance essential.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Storage solutions: from NFS, GlusterFS, CephFS to JuiceFS&lt;/h2&gt;
&lt;h3&gt;Initial solution: NFS mount&lt;/h3&gt;
&lt;p&gt;Initially, we tried the simplest solution—using &lt;a href="https://en.wikipedia.org/wiki/Network_File_System"&gt;NFS&lt;/a&gt; for mounting. However, in practice, we found that the training cluster and rendering cluster required independent clusters for mount operations. Maintaining this setup was very cumbersome. Especially when adding new data, as we needed to write mount points separately for each new dataset. &lt;strong&gt;When the data volume reached about 1 million objects, we could no longer sustain this approach and abandoned it.&lt;/strong&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;Storage architecture based on NFS- difficult scaling, complex operations&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;Storage architecture based on NFS- difficult scaling, complex operations&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;h3&gt;Mid-term solution: GlusterFS&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://juicefs.com/docs/community/comparison/juicefs_vs_glusterfs"&gt;GlusterFS&lt;/a&gt; was an easy-to-start-with choice, offering simple installation and configuration, acceptable performance, and no need for multiple mount points—just add new nodes.  &lt;/p&gt;
&lt;p&gt;While GlusterFS greatly reduced our workload in the early stages, we also discovered issues with its ecosystem:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Many GlusterFS execution scripts and features required writing custom scheduled tasks. Particularly when adding new storage, it had additional requirements, such as needing to increase nodes by specific multiples.  &lt;/li&gt;
&lt;li&gt;Support for operations like cloning and data synchronization was weak. This led us to frequently consult documentation.  &lt;/li&gt;
&lt;li&gt;Many operations were unstable. For example, when using tools like fio for speed testing, results were not always reliable.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A more serious problem was that GlusterFS performance would drastically decline when the number of small files reached a certain scale.&lt;/strong&gt; For example, one model might generate 100 images. With 10 million models, that would produce 1 billion images. GlusterFS struggled severely with addressing in later stages, especially with an excessive number of small files. This led to significant performance drops and even system crashes.&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;Storage architecture based on GlusterFS&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;Storage architecture based on GlusterFS&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;h3&gt;Final selection: CephFS vs. JuiceFS&lt;/h3&gt;
&lt;p&gt;As storage demands grew, we decided to use a more sustainable solution. After evaluating various options, we compared &lt;a href="https://juicefs.com/docs/community/comparison/juicefs_vs_cephfs/"&gt;CephFS and JuiceFS&lt;/a&gt;.  &lt;/p&gt;
&lt;p&gt;Although Ceph is widely used, through our own practice and reviewing documentation, we found Ceph's operational and management costs to be very high. Especially for a small team like ours, handling such complex operational tasks proved particularly difficult.  &lt;/p&gt;
&lt;p&gt;JuiceFS had two native features that strongly aligned with our needs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The client data cache.&lt;/strong&gt; For our model training clusters, which are typically equipped with high-performance NVMe storage, fully utilizing client caching could significantly accelerate model training and reduce pressure on the JuiceFS storage backend.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;JuiceFS' S3 compatibility was crucial for us.&lt;/strong&gt; As we had developed some visualization platforms based on storage for data annotation, organization, and statistics, S3 compatibility allowed us to rapidly develop web interfaces supporting visualization, data statistics, and other features.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The table below &lt;a href="https://juicefs.com/docs/community/comparison/juicefs_vs_cephfs/"&gt;compares basic features of CephFS and JuiceFS&lt;/a&gt;:&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-table"&gt;

&lt;table&gt;
    
    
        &lt;thead&gt;
            &lt;tr&gt;
                
                    
                        
                        
                            &lt;th scope="col"  &gt;
                                
                                    
                                        Comparison basis
                                    
                                
                            &lt;/th&gt;
                        
                    
                
                    
                        
                        
                            &lt;th scope="col"  &gt;
                                
                                    
                                        CephFS
                                    
                                
                            &lt;/th&gt;
                        
                    
                
                    
                        
                        
                            &lt;th scope="col"  &gt;
                                
                                    
                                        JuiceFS
                                    
                                
                            &lt;/th&gt;
                        
                    
                
            &lt;/tr&gt;
        &lt;/thead&gt;
    
    &lt;tbody&gt;
        
            
                &lt;tr&gt;
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                File chunking
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                ✓
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                ✓
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                &lt;/tr&gt;
            
        
            
                &lt;tr&gt;
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                Metadata transactions
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                ✓
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                ✓
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                &lt;/tr&gt;
            
        
            
                &lt;tr&gt;
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                Strong consistency
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                ✓
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                ✓
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                &lt;/tr&gt;
            
        
            
                &lt;tr&gt;
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                Kubernetes CSI Driver
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                ✓
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                ✓
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                &lt;/tr&gt;
            
        
            
                &lt;tr&gt;
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                Hadoop-compatible
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                ✓
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                ✓
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                &lt;/tr&gt;
            
        
            
                &lt;tr&gt;
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                Data compression
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                ✓
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                ✓
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                &lt;/tr&gt;
            
        
            
                &lt;tr&gt;
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                Data encryption
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                ✓
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                ✓
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                &lt;/tr&gt;
            
        
            
                &lt;tr&gt;
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                Snapshot
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                ✓
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                ✓
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                &lt;/tr&gt;
            
        
            
                &lt;tr&gt;
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                Client data caching
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                ✕
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                ✓
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                &lt;/tr&gt;
            
        
            
                &lt;tr&gt;
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                Hadoop data locality
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                ✕
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                ✓
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                &lt;/tr&gt;
            
        
            
                &lt;tr&gt;
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                S3-compatible
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                ✕
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                ✓
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                &lt;/tr&gt;
            
        
            
                &lt;tr&gt;
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                Quota
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                Directory level quota
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                Directory level quota
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                &lt;/tr&gt;
            
        
            
                &lt;tr&gt;
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                Languages
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                C++
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                Go
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                &lt;/tr&gt;
            
        
            
                &lt;tr&gt;
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                License
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                LGPLv2.1 &amp;amp; LGPLv3
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                Apache License 2.0
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                &lt;/tr&gt;
            
        
    &lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;h2&gt;Storage platform practice based on JuiceFS&lt;/h2&gt;
&lt;h3&gt;Metadata engine selection and topology&lt;/h3&gt;
&lt;p&gt;JuiceFS employs a metadata-data separation architecture with several metadata engine options. We first quickly validated the &lt;a href="https://juicefs.com/docs/community/redis_best_practices/"&gt;Redis storage solution&lt;/a&gt;, which is well-documented by the JuiceFS team. Redis' advantage lies in its lightweight nature; configuration typically takes only a day or half a day, and data migration is smooth. &lt;strong&gt;However, when the number of small files exceeded 100 million, Redis' speed and performance significantly declined&lt;/strong&gt;.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;JuiceFS 架构图（第四版）-第 2 页-winfsp (2)&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;JuiceFS Community Edition architecture&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;As mentioned earlier, each model might render 100 images. With other miscellaneous files, the number of small files increased dramatically. While we could mitigate the issue by packing small files, performing modifications or visualization on packed data greatly increased complexity. Therefore, we preferred to retain the original small image files for subsequent processing&lt;/p&gt;
&lt;p&gt;As the file count grew and soon exceeded Redis' capacity, we decided to migrate the storage system to a combination of &lt;a href="https://tikv.org/"&gt;TiKV&lt;/a&gt; and Kubernetes (K8s). &lt;strong&gt;The TiKV-K8s setup provided us with a more highly available metadata storage solution&lt;/strong&gt;. Furthermore, through benchmarking, we found that although TiKV's performance was slightly lower, the gap was not significant, and its support for small files was better than Redis'. We also consulted JuiceFS engineers and learned that Redis has poor scalability in cluster mode. Therefore, we switched to TiKV.&lt;/p&gt;
&lt;p&gt;The table below shows read/write performance test results for different metadata engines:&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-table"&gt;

&lt;table&gt;
    
    
        &lt;thead&gt;
            &lt;tr&gt;
                
                    
                        
                        
                            &lt;th scope="col"  &gt;
                                
                            &lt;/th&gt;
                        
                    
                
                    
                        
                        
                            &lt;th scope="col"  &gt;
                                
                                    
                                        Redis-always
                                    
                                
                            &lt;/th&gt;
                        
                    
                
                    
                        
                        
                            &lt;th scope="col"  &gt;
                                
                                    
                                        Redis-every second
                                    
                                
                            &lt;/th&gt;
                        
                    
                
                    
                        
                        
                            &lt;th scope="col"  &gt;
                                
                                    
                                        MySQL
                                    
                                
                            &lt;/th&gt;
                        
                    
                
                    
                        
                        
                            &lt;th scope="col"  &gt;
                                
                                    
                                        PostgreSQL
                                    
                                
                            &lt;/th&gt;
                        
                    
                
                    
                        
                        
                            &lt;th scope="col"  &gt;
                                
                                    
                                        TiKV
                                    
                                
                            &lt;/th&gt;
                        
                    
                
                    
                        
                        
                            &lt;th scope="col"  &gt;
                                
                                    
                                        etcd
                                    
                                
                            &lt;/th&gt;
                        
                    
                
                    
                        
                        
                            &lt;th scope="col"  &gt;
                                
                                    
                                        FoundationDB
                                    
                                
                            &lt;/th&gt;
                        
                    
                
            &lt;/tr&gt;
        &lt;/thead&gt;
    
    &lt;tbody&gt;
        
            
                &lt;tr&gt;
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                Write big files
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                730.84 MiB/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                731.93 MiB/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                729.00 MiB/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                744.47 MiB/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                730.01 MiB/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                746.07 MiB/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                744.70 MiB/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                &lt;/tr&gt;
            
        
            
                &lt;tr&gt;
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                Read big files
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                923.98 MiB/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                892.99 MiB/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                905.93 MiB/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                895.88 MiB/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                918.19 MiB/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                939.63 MiB/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                948.81 MiB/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                &lt;/tr&gt;
            
        
            
                &lt;tr&gt;
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                Write small files
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                95.20 files/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                109.10 files/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                82.30 files/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                86.40 files/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                101.20 files/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                95.80 files/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                94.60 files/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                &lt;/tr&gt;
            
        
            
                &lt;tr&gt;
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                Read small files
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                1242.80 files/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                937.30 files/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                752.40 files/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                1857.90 files/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                681.50 files/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                1229.10 files/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                1301.40 files/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                &lt;/tr&gt;
            
        
            
                &lt;tr&gt;
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                Stat files
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                12313.80 files/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                11989.50 files/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                3583.10 files/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                7845.80 files/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                4211.20 files/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                2836.60 files/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                3400.00 files/s
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                &lt;/tr&gt;
            
        
            
                &lt;tr&gt;
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                FUSE operations
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                0.41 ms/op
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                0.40 ms/op
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                0.46 ms/op
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                0.44 ms/op
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                0.41 ms/op
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                0.41 ms/op
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                0.44 ms/op
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                &lt;/tr&gt;
            
        
            
                &lt;tr&gt;
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                Update metadata
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                2.45 ms/op
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                1.76 ms/op
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                2.46 ms/op
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                1.78 ms/op
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                3.76 ms/op
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                3.40 ms/op
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                        
                            
                            
                                
                                    &lt;td  &gt;
                                        
                                            
                                                2.87 ms/op
                                            
                                        
                                    &lt;/td&gt;
                                
                            
                        
                    
                &lt;/tr&gt;
            
        
    &lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;h3&gt;Latest architecture: JuiceFS+TiKV+SeaweedFS&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;We use JuiceFS to manage the object storage layer. For the metadata storage system, we built it with TiKV and K8s. For object storage, we used SeaweedFS.&lt;/strong&gt; This allows us to quickly scale storage capacity and provides fast access for both small and large files. In addition, our object storage is distributed across multiple platforms, including local storage and platforms like R2 and Amazon S3. Through JuiceFS, we were able to integrate these different storage systems and provide a unified interface.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;Storage architecture- JuiceFS+TiKV+SeaweedFS&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;Storage architecture- JuiceFS+TiKV+SeaweedFS&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;To better manage system resources, we built a resource monitoring platform on K8s. The current system consists of about 60 Linux nodes and several Windows nodes handling rendering and data processing tasks. We monitored read stability, and the results show that even with multiple heterogeneous servers performing simultaneous read operations, the overall system I/O performance remains stable, able to fully utilize the bandwidth resources.&lt;/p&gt;
&lt;h3&gt;Problems we encountered&lt;/h3&gt;
&lt;p&gt;During the optimization of the storage solution, we initially tried an &lt;a href="https://en.wikipedia.org/wiki/Erasure_code"&gt;erasure code&lt;/a&gt; (EC) storage scheme aimed at reducing storage requirements and improving efficiency. However, in large-scale data migration, EC storage computation was slow, and its performance was unsatisfactory in high-throughput and frequent data change scenarios. Especially when combined with SeaweedFS, bottlenecks existed. Based on these issues, we decided to abandon EC storage and switch to a replication-based storage scheme.  &lt;/p&gt;
&lt;p&gt;We set up independent servers and configured scheduled tasks for large-volume metadata backups. In TiKV, we implemented a redundant replica mechanism, adopting a multi-replica scheme to ensure data integrity. For object storage, we used dual-replica encoding to further enhance data reliability. Although replica storage effectively ensures data redundancy and high availability, storage costs remain high due to processing petabyte-scale data and massive incremental data. In the future, we may consider further optimizing the storage scheme to reduce costs.  &lt;/p&gt;
&lt;p&gt;In addition, we found that using all-flash servers with JuiceFS did not bring significant performance improvements. The bottleneck mainly appeared in network bandwidth and latency. Therefore, we plan to consider using InfiniBand to connect storage servers and training servers to maximize resource utilization efficiency.&lt;/p&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;p&gt;When using GlusterFS, we could process at most 200,000 models per day. &lt;strong&gt;After switching to JuiceFS, the processing capacity increased significantly. Our daily data processing capacity has grown by 2.5 times. Small file throughput also improved notably. The system remained stable even when storage utilization reached 70%.&lt;/strong&gt; Furthermore, scaling became very convenient, whereas the previous architecture involved troublesome scaling processes.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;NFS vs. JuiceFS&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;NFS vs. JuiceFS&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;Finally, let's summarize the advantages JuiceFS has demonstrated in 3D generation tasks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Small file performance:&lt;/strong&gt; Small file handling is a critical point, and JuiceFS provides an excellent solution.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cross-platform features:&lt;/strong&gt; Cross-platform support is very important. We found that some data can only be opened in Windows software, so we need to process the same data on both Windows and Linux systems and perform read/write operations on the same mount point. This requirement makes cross-platform features particularly crucial, and JuiceFS' design addresses this well.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Low operational cost:&lt;/strong&gt; JuiceFS' operational cost is extremely low. After configuration, only simple testing and node management (for example, discarding certain nodes and monitoring robustness) are needed. We spent about half a year migrating data and have not encountered major issues so far.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Local cache mechanism:&lt;/strong&gt; Previously, to use local cache, we needed to manually implement local caching logic in our code. JuiceFS provides a very convenient local caching mechanism, optimizing performance for training scenarios by setting mount parameters.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Low migration cost:&lt;/strong&gt; Especially when migrating small files, we found using JuiceFS for metadata and object storage migration to be convenient, saving us a lot of time and effort. In contrast, migrating with other storage systems was very painful.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In summary, JuiceFS performs excellently in large-scale data processing, providing an efficient and stable storage solution. It not only simplifies storage management and scaling but also significantly improves system performance. This allows us to focus more on advancing core tasks. In addition, the JuiceFS tools are very convenient. For example, we used the &lt;code&gt;sync&lt;/code&gt; tool for small file migration with extremely high efficiency. Without additional performance optimization, we successfully migrated 500 TB of data, including a massive number of small data and image files. It was done in less than 5 days, exceeding our expectations.  &lt;/p&gt;
&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="https://go.juicefs.com/slack/"&gt;community on Slack&lt;/a&gt;.&lt;/p&gt;&lt;/div&gt;</description><pubDate>Thu, 08 Jan 2026 10:09:00 +0000</pubDate><guid>https://www.juicefs.com/en/blog/user-stories/aigc-storage-glusterfs-cephfs-vs-juicefs</guid></item><item><title>JuiceFS 2025 Recap: Sustaining Fast Growth, Scaling to Hundreds of Billions of Files</title><link>https://www.juicefs.com/en/blog/company/2025-recap-artificial-intelligence-storage</link><description>&lt;div class="block-markdown"&gt;&lt;p&gt;As we reflect on the journey of the past year, we're thrilled to share that 2025 ushered in the ninth year for &lt;a href="https://juicefs.com/docs/cloud/"&gt;JuiceFS Enterprise Edition&lt;/a&gt; and a significant fifth anniversary for our open-source &lt;a href="https://juicefs.com/docs/community/introduction/"&gt;Community Edition&lt;/a&gt;. Our focus remained unchanged: building a high-performance, easy-to-use file system.&lt;/p&gt;
&lt;p&gt;All key metrics continued the growth momentum from the previous year. &lt;strong&gt;The data volume managed by the Community Edition grew by 89%, exceeding 1.3 EiB&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;In 2025, the JuiceFS Community Edition continued to prioritize versatility, especially in supporting diverse AI workloads. We released Python SDK, improved Windows client usability, and strengthened integration with cloud-native ecosystems. Metadata engines like SQL databases and TiKV also received targeted optimizations. &lt;strong&gt;This year, alongside community contributors, we drove continuous iteration with 60 contributors, 305 new issues, and 601 merged pull requests (PRs).&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;During the development of the Enterprise Edition, &lt;strong&gt;our greatest challenge this year was managing hyperscale data&lt;/strong&gt;. As AI technologies like autonomous driving become integrated into daily life, data volume growth is unprecedented. Managing hundreds of billions of files introduces exponentially increasing complexity in metadata management and data consistency. To tackle these challenges, the Enterprise Edition underwent comprehensive upgrades in core features like metadata partitioning and network performance.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://juicefs.com/en/blog/release-notes/juicefs-5-2-windows-client"&gt;JuiceFS Enterprise Edition 5.2&lt;/a&gt;, released in the first half of the year, already supports a single volume at the hundred-billion-file scale. The upcoming 5.3 version will push this limit to 500 billion files. This allows users to no longer worry about data scale, with JuiceFS' performance and stability providing solid assurance.&lt;/p&gt;
&lt;p&gt;Let’s take a closer look at JuiceFS’ achievements in 2025.&lt;/p&gt;
&lt;h2&gt;Community Edition: Python SDK support and Windows client improvements&lt;/h2&gt;
&lt;p&gt;Since its open-source release, JuiceFS has been extensively validated in enterprise production environments, with its core features becoming increasingly stable. We released 9 versions throughout the year, with &lt;a href="https://juicefs.com/en/blog/release-notes/juicefs-1-3-python-sdk-backup-sql-windows-optimization"&gt;version 1.3&lt;/a&gt; being the fourth major release since its 2021 open-source debut and designated as a long-term support (LTS) version. Key optimizations in this version include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Python SDK Support&lt;/strong&gt;, enhancing flexibility and performance in AI and data science scenarios.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Windows client optimizations&lt;/strong&gt;, improving tool support and system service mounting capabilities.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Backup mechanism enhancement&lt;/strong&gt;, enabling minute-level backups for 100 million files.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Integration with Apache Ranger&lt;/strong&gt;, allowing JuiceFS to support fine-grained permission management in big data scenarios.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Performance improvements for SQL and TiKV metadata engines&lt;/strong&gt;, delivering more efficient performance in hyperscale scenarios.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the second half of the year, we began preparing for version 1.4. Planned new features include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Support for user and group quotas  &lt;/li&gt;
&lt;li&gt;Redis client caching  &lt;/li&gt;
&lt;li&gt;Least recently used (LRU) cache support  &lt;/li&gt;
&lt;li&gt;SMB/CIFS support  &lt;/li&gt;
&lt;li&gt;Hadoop Kerberos support  &lt;/li&gt;
&lt;li&gt;S3 Gateway optimizations  &lt;/li&gt;
&lt;li&gt;Resumable &lt;code&gt;sync&lt;/code&gt; tool transfers  &lt;/li&gt;
&lt;li&gt;Support for commercial data encryption algorithms  &lt;/li&gt;
&lt;li&gt;Readahead strategy optimization  &lt;/li&gt;
&lt;li&gt;Batch deletion improvements  &lt;/li&gt;
&lt;li&gt;Related tool optimizations&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All aimed at further boosting system performance and stability.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;11&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;JuiceFS contributors in 2025&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;&lt;a href="https://juicefs.com/docs/csi/introduction/"&gt;JuiceFS CSI Driver&lt;/a&gt; released 18 versions over the past year, continuously optimizing JuiceFS' storage efficiency and stability in environments like Kubernetes. New features include volume path status detection, shared Mount Pods for the same file system, support for native Kubernetes Sidecar, and Dashboard cache group management. In addition, we made performance and reliability optimizations, not only improving stability but also enhancing compatibility with multi-pod configurations and containerized applications.  &lt;/p&gt;
&lt;p&gt;&lt;a href="https://juicefs.com/docs/csi/guide/juicefs-operator/"&gt;JuiceFS Operator&lt;/a&gt; added a scheduled cache warm-up feature to improve performance for application data access. It now supports cache groups deployed by replica, achieving cache high availability. It also introduced a Sync feature for efficient data synchronization within Kubernetes environments, ensuring consistency.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;csi contributor&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;JuiceFS CSI Driver contributors in 2025&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;h2&gt;Enterprise Edition: Single-volume hundred-billion-file scale with robust performance and stability&lt;/h2&gt;
&lt;p&gt;In the first half of 2025, &lt;a href="https://juicefs.com/en/blog/release-notes/juicefs-5-2-windows-client"&gt;JuiceFS Enterprise Edition 5.2&lt;/a&gt; was released, breaking through the hundred-billion-file scale for a single file system and significantly enhancing stability for hyperscale clusters and the network performance of distributed caching. To achieve this, we spent a lot of time and effort, particularly in optimizing performance when handling massive datasets and high-concurrency access. This version has been validated in production environments across several enterprises, &lt;strong&gt;maintaining metadata latency at the 1-millisecond level even at the single-volume hundred-billion-file scale&lt;/strong&gt;.&lt;br&gt;
At the same time, we &lt;a href="https://juicefs.com/en/blog/engineering/terabyte-aggregate-bandwidth-distributed-cache-network"&gt;optimized distributed cache network performance&lt;/a&gt;, greatly reducing CPU overhead in TCP networks while improving network bandwidth utilization. &lt;strong&gt;In a test environment with 100 GCP 100Gbps nodes, aggregate read bandwidth reached 1.2 TB/s, close to full utilization of TCP/IP network bandwidth&lt;/strong&gt;.&lt;br&gt;
Furthermore, &lt;a href="https://juicefs.com/en/blog/release-notes/juicefs-1-3-python-sdk"&gt;Python SDK&lt;/a&gt; achieved fsspec compatibility and on-demand import of object storage files, enabling easier access to existing data in object storage. This resolves read amplification issues in specific scenarios and enhances global QoS capabilities, thereby increasing system flexibility and performance.&lt;br&gt;
The multi-zone architecture is a key technology enabling JuiceFS to handle hundreds of billions of files, ensuring high scalability and high-concurrency processing capabilities. In the second half of the year, we focused on developing version 5.3, which delivered comprehensive optimizations to this architecture. &lt;strong&gt;The zone limit was increased from 256 to 1,024, enabling a single volume to support the storage and access demands of over 500 billion files&lt;/strong&gt;.&lt;br&gt;
This achievement involved a series of complex tasks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Systematically refining cross-zone link implementations and establishing a background self-check mechanism to improve cluster reliability and stability  &lt;/li&gt;
&lt;li&gt;Developing hotspot monitoring and automatic migration tools for efficient hotspot handling  &lt;/li&gt;
&lt;li&gt;Optimizing distributed cache management to reduce cache conflicts and improve concurrent performance&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To further enhance distributed network performance, this version introduced RDMA technology for the first time (currently in experimental phase). Initial tests show it outperforms the TCP protocol in terms of stability and CPU usage. Version 5.3 is scheduled for release in January, 2026. Stay tuned for more details.&lt;/p&gt;
&lt;h2&gt;Community growth: Rapid expansion, total data volume exceeding 1.3 EiB&lt;/h2&gt;
&lt;p&gt;Currently, JuiceFS has got 12.6k+ GitHub stars. Our downloads have surpassed 50,000, while JuiceFS CSI Driver downloads have exceeded 5 million.The Slack community has reached about 1,000 members.&lt;br&gt;
The fifth year since the open-source release of the Community Edition marks another year of rapid growth. User-reported data shows continued upward trends across key JuiceFS metrics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;File systems: 590k+, an 82% increase&lt;/strong&gt;  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Active clients: 150k+, a 46% increase&lt;/strong&gt;  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;File count: 400+ billion, a 43% increase&lt;/strong&gt;  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total data volume: 1.3+ EiB, an 89% increase&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;Number of file systems and data volume&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;Number of file systems and data volume&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;p&gt;This year, we made our mark at industry conferences by participating in events such as KubeCon + CloudNativeCon North America, Open Source Summit Japan, and SNIA Developer Conference (SDC).&lt;br&gt;
To better support our users, we regularly host five sessions of Office Hours to introduce new features, address questions, and help users across various industries confidently deploy JuiceFS in production environments. Use cases span fields including autonomous driving, generative AI, AI infrastructure platforms, quantitative investing, and biopharmaceuticals. (View all &lt;a href="https://juicefs.com/en/blog/user-stories"&gt;case studies&lt;/a&gt;)&lt;br&gt;
A special thanks to the users who shared their experiences this year—their practical insights have provided invaluable reference for the community:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://juicefs.com/en/blog/user-stories/quantitative-storage-artificial-intelligence-solution"&gt;JuiceFS+MinIO: Ariste AI Achieved 3x Faster I/O and Cut Storage Costs by 40%+&lt;/a&gt;, by Yutang Gao at Ariste AI  &lt;/li&gt;
&lt;li&gt;&lt;a href="https://juicefs.com/en/blog/user-stories/multi-cloud-storage-autonomous-driving"&gt;Zelos Tech Manages Hundreds of Millions of Files for Autonomous Driving with JuiceFS&lt;/a&gt;, by Junyu Deng at Zelos Tech  &lt;/li&gt;
&lt;li&gt;&lt;a href="https://juicefs.com/en/blog/user-stories/multi-cloud-storage-artificial-intelligence-training"&gt;Why Gaoding Technology Chose JuiceFS for AI Storage in a Multi-Cloud Architecture&lt;/a&gt;, by Jia Ke at Gaoding Technology  &lt;/li&gt;
&lt;li&gt;&lt;a href="https://juicefs.com/en/blog/user-stories/artificial-intelligence-storage-large-language-model-multimodal"&gt;StepFun Built an Efficient and Cost-Effective LLM Storage Platform with JuiceFS&lt;/a&gt;, by Changxin Miao at StepFun  &lt;/li&gt;
&lt;li&gt;&lt;a href="https://juicefs.com/en/blog/user-stories/artificial-intelligence-model-training-unified-storage-solution"&gt;INTSIG Built Unified Storage Based on JuiceFS to Support Petabyte-Scale AI Training&lt;/a&gt;, by Yifan Tang at INTSIG  &lt;/li&gt;
&lt;li&gt;&lt;a href="https://juicefs.com/en/blog/user-stories/glusterfs-vs-juicefs-ai-computing"&gt;vivo Migrated from GlusterFS to a Distributed File System Built on JuiceFS&lt;/a&gt;, by Xiangyang Yu at vivo   &lt;/li&gt;
&lt;li&gt;&lt;a href="https://juicefs.com/en/blog/user-stories/ai-storage-platform-large-language-model-training-inference"&gt;NFS to JuiceFS: Building a Scalable Storage Platform for LLM Training &amp;amp; Inference&lt;/a&gt;, by Wei Sun at a leading research institution in China  &lt;/li&gt;
&lt;li&gt;&lt;a href="https://juicefs.com/en/blog/user-stories/ai-storage-life-sciences-solution-juicefs-vs-lustre-alluxio"&gt;BioMap Cut AI Model Storage Costs by 90% Using JuiceFS&lt;/a&gt;, by Zedong Zheng at BioMap  &lt;/li&gt;
&lt;li&gt;&lt;a href="https://juicefs.com/en/blog/user-stories/large-language-model-artificial-intelligence-storage-cost-effective"&gt;JuiceFS at Trip.com: Managing 10 PB of Data for Stable and Cost-Effective LLM Storage&lt;/a&gt;, by Songlin Wu at Trip.com  &lt;/li&gt;
&lt;li&gt;&lt;a href="https://juicefs.com/en/blog/user-stories/cloud-storage-artificial-intelligence-juicefs-vs-efs"&gt;How Lepton AI Cut Cloud Storage Costs by 98% for AI Workflows with JuiceFS&lt;/a&gt;, by Cong Ding at Lepton AI  &lt;/li&gt;
&lt;li&gt;&lt;a href="https://juicefs.com/en/blog/user-stories/juicefs-vs-cephfs-distributed-file-system-artificial-intelligence-storage"&gt;Tongcheng Travel Chose JuiceFS over CephFS to Manage Hundreds of Millions of Files&lt;/a&gt;, by Chuanhai Wei at Tongcheng Travel &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We've shared a remarkable year together. JuiceFS has evolved from an emerging open source project into a trusted solution powering AI-driven businesses today. We extend our deepest gratitude to every one of you for your active participation and steadfast support—whether through answering questions, sharing real-world experiences, or contributing code to the project.&lt;/p&gt;
&lt;p&gt;In the coming year, JuiceFS will continue to deliver a more efficient and seamless experience for your work.&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 31 Dec 2025 08:39:41 +0000</pubDate><guid>https://www.juicefs.com/en/blog/company/2025-recap-artificial-intelligence-storage</guid></item><item><title>AI Data Storage: Challenges, Capabilities, and Comparative Analysis</title><link>https://www.juicefs.com/en/blog/solutions/ai-data-storage-challenges-capabilities-solution-comparison</link><description>&lt;div class="block-markdown"&gt;&lt;p&gt;&lt;em&gt;Note: This article was first published on &lt;a href="https://dzone.com/articles/ai-data-storage-challenges-capabilities-comparison"&gt;DZone&lt;/a&gt; and featured on its homepage.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The explosion in the popularity of ChatGPT has once again ignited a surge of excitement in the &lt;a href="https://en.wikipedia.org/wiki/Artificial_intelligence"&gt;AI&lt;/a&gt; world. Over the past five years, AI has advanced rapidly and has found applications in a wide range of industries. As a storage company, we’ve had a front-row seat to this expansion, watching more and more AI startups and established players emerge across fields like autonomous driving, protein structure prediction, and quantitative investment.&lt;br&gt;
AI scenarios have introduced new challenges to the field of data storage. Existing storage solutions are often inadequate to fully meet these demands.&lt;/p&gt;
&lt;p&gt;In this article, we’ll deep dive into the storage challenges in AI scenarios, critical storage capabilities, and comparative analysis of Amazon S3, Alluxio, Amazon EFS, Azure, GCP Filestore, Lustre, Amazon FSx for Lustre, GPFS, BeeGFS, and JuiceFS Cloud Service. I hope this post will help you make informed choices in AI and data storage.&lt;/p&gt;
&lt;h2&gt;Storage challenges for AI&lt;/h2&gt;
&lt;p&gt;AI scenarios have brought new data patterns:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/High-throughput"&gt;&lt;strong&gt;High throughput&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;data access challenges:&lt;/strong&gt; In AI scenarios, the growing use of GPUs by enterprises has outpaced the I/O capabilities of underlying storage systems. Enterprises require storage solutions that can provide high-throughput data access to fully leverage the computing power of GPUs. For instance, in smart manufacturing, where high-precision cameras capture images for defect detection models, the training dataset may consist of only 10,000 to 20,000 high-resolution images. Each image has several gigabytes in size, resulting in a total dataset size of 10 TB. If the storage system lacks the required throughput, it becomes a bottleneck during GPU training.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Managing storage for billions of files:&lt;/strong&gt; AI scenarios need storage solutions that can handle and provide quick access to datasets with billions of files. For example, in autonomous driving, the training dataset consists of small images, each about several hundred kilobytes in size. A single training set comprises tens of millions of such images, each sized several hundred kilobytes. Each image is treated as an individual file. The total training data amounts to billions or even 10 billion files. This creates a major challenge in effectively managing large numbers of small files.  &lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalable throughput for hot data:&lt;/strong&gt; In areas like &lt;a href="https://en.wikipedia.org/wiki/Quantitative_analysis_(finance)"&gt;quantitative investing&lt;/a&gt;, financial market data is smaller compared to computer vision datasets. However, this data must be shared among many research teams, leading to hotspots where disk throughput is fully used but still cannot satisfy the application's needs. This shows that we need storage solutions that can handle a lot of hot data quickly.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The basic computing environment has also changed a lot.&lt;br&gt;
These days, with cloud computing and Kubernetes getting so popular, more and more AI companies are setting up their data pipelines on &lt;a href="https://kubernetes.io/"&gt;Kubernetes&lt;/a&gt;-based platforms. Algorithm engineers request resources on the platform, write code in Notebook to debug algorithms, use workflow engines like Argo and Airflow to plan data processing workflows, use Fluid to manage datasets, and use BentoML to deploy models into apps. &lt;a href="https://en.wikipedia.org/wiki/Cloud-native_computing"&gt;&lt;strong&gt;Cloud-native&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;technologies have become a standard consideration when building storage platforms.&lt;/strong&gt; As cloud computing matures, AI businesses are increasingly relying on large-scale distributed clusters. With a significant increase in the number of nodes in these clusters, &lt;strong&gt;storage systems face new challenges related to handling concurrent access from tens of thousands of Pods within Kubernetes clusters.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;IT professionals managing the underlying infrastructure face significant changes brought about by the evolving business scenarios and computing environments. Existing hardware-software coupled storage solutions often suffer from several pain points, such as no elasticity, no distributed high availability, and constraints on cluster scalability. &lt;a href="https://en.wikipedia.org/wiki/Comparison_of_distributed_file_systems"&gt;Distributed file systems&lt;/a&gt; like GlusterFS, CephFS, and those designed for HPC such as Lustre, BeeGFS, and GPFS, are typically designed for physical machines and bare-metal disks. While they can deploy large capacity clusters, they cannot provide elastic capacity and flexible throughput, especially when dealing with storage demands in the order of tens of billions of files.&lt;/p&gt;
&lt;h2&gt;Key capabilities for AI data storage&lt;/h2&gt;
&lt;p&gt;Considering these challenges, we’ll outline essential storage capabilities critical for AI scenarios, helping enterprises make informed decisions when selecting storage products.&lt;/p&gt;
&lt;h3&gt;POSIX compatibility and data consistency&lt;/h3&gt;
&lt;p&gt;In the AI/ML domain, &lt;a href="https://en.wikipedia.org/wiki/POSIX"&gt;POSIX&lt;/a&gt; is the most common API for data access. Previous-generation distributed file systems, except HDFS, are also POSIX-compatible, but products on the cloud in recent years have not been consistent in their POSIX support:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Compatibility:&lt;/strong&gt; Users should not solely rely on the description "POSIX-compatible product" to assess compatibility. You can use pjdfstest and the Linux Test Project (LTP) framework for testing. We’ve done a &lt;a href="https://juicefs.com/en/blog/engineering/posix-compatibility-comparison-among-four-file-system-on-the-cloud"&gt;POSIX compatibility test of the cloud file system&lt;/a&gt; for your reference.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Strong data consistency guarantee:&lt;/strong&gt; This is fundamental to ensuring computational correctness. Storage systems have various consistency implementations, with object storage systems often adopting eventual consistency, while file systems typically adhere to strong consistency. Careful consideration is needed when selecting a storage system.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;User mode or kernel mode:&lt;/strong&gt; Early developers favored kernel mode due to its potential for optimized I/O operations. However, in recent years, we’ve witnessed a growing number of developers "escaping" from kernel mode for several reasons:&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Kernel mode usage ties the file system client to specific kernel versions. GPU and high-performance network card drivers often require compatibility with specific kernel versions. This combination of factors places a significant burden on kernel version selection and maintenance.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Exceptions of kernel mode clients can potentially freeze the host operating system. This is highly unfavorable for Kubernetes platforms.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The user-mode FUSE library has undergone continuous iterations, resulting in significant performance improvements. It has been well-supported among &lt;a href="https://juicefs.com/docs/community/introduction/"&gt;JuiceFS&lt;/a&gt; customers for various business needs, such as autonomous driving perception model training and quantitative investment strategy training. This demonstrates that in AI scenarios, the user-mode FUSE library is no longer a performance bottleneck.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-ImageWithCaption"&gt;&lt;dl&gt;
    &lt;dt&gt;image&lt;/dt&gt;
    &lt;dd&gt;221&lt;/dd&gt;
    &lt;dt&gt;caption&lt;/dt&gt;
    &lt;dd&gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-markdown"&gt;&lt;h3&gt;Linear scalability of throughput&lt;/h3&gt;
&lt;p&gt;Different file systems employ different principles for scaling throughput. Previous-generation distributed storage systems like GlusterFS, CephFS, the HPC-oriented Lustre, BeeGFS, and GPFS primarily use all-flash solutions to build their clusters. &lt;strong&gt;In these systems, peak throughput equals the total performance of the disks in the cluster. To increase cluster throughput, users must scale the cluster by adding more disks.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;However, when users have imbalanced needs for capacity and throughput, &lt;strong&gt;traditional file systems require scaling the entire cluster, leading to capacity wastage&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;For example, for a 500 TB capacity cluster using 8 TB hard drives with 2 replicas, 126 drives with a throughput of 150 MB/s each are needed. The theoretical maximum throughput of the cluster is 18 GB/s (126 ×150 = 18 GB/s). If the application demands 60 GB/s throughput, there are two options: &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Switching to 2 TB HDDs (with 150 MB/s throughput) and requiring 504 drives&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Switching to 8 TB SATA SSDs (with 500 MB/s throughput) while maintaining 126 drives&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The first solution increases the number of drives by four times, necessitating a corresponding increase in the number of cluster nodes. The second solution, upgrading to SSDs from HDDs, also results in a significant cost increase.&lt;/p&gt;
&lt;p&gt;As you can see, it’s difficult to balance capacity, performance, and cost. Capacity planning based on these three perspectives becomes a challenge, because we cannot predict the development, changes, and details of the real business.&lt;/p&gt;
&lt;p&gt;Therefore, &lt;strong&gt;decoupling storage capacity from performance scaling would be a more effective approach for businesses to address these challenges. When we designed JuiceFS, we considered this requirement&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;In addition, handling hot data is a common problem in AI scenarios. JuiceFS employs a cache grouping mechanism to automatically distribute hot data to different cache groups. This means that JuiceFS automatically creates multiple copies of hot data during computation to achieve higher disk throughput, and these cache spaces are automatically reclaimed after computation.&lt;/p&gt;
&lt;h3&gt;Managing massive amounts of files&lt;/h3&gt;
&lt;p&gt;Efficiently managing a large number of files, such as 10 billion files has three demands on the storage system:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Elastic scalability:&lt;/strong&gt; The real scenario of JuiceFS users is to expand from tens of millions of files to hundreds of millions of files and then to billions of files. This process is not possible by adding a few machines. Storage clusters need to add nodes to achieve &lt;a href="https://www.virtana.com/glossary/what-is-horizontal-scaling/"&gt;horizontal scaling&lt;/a&gt;, enabling them to support business growth effectively.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Data distribution during horizontal scaling:&lt;/strong&gt; During system scaling, data distribution rules based on directory name prefixes may lead to uneven data distribution.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Scaling complexity:&lt;/strong&gt; As the number of files increases, the ease of system scaling, stability, and the availability of tools for managing storage clusters become vital considerations. Some systems become more fragile as file numbers reach billions. Ease of management and high stability are crucial for business growth.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Concurrent load capacity and feature support in Kubernetes environments&lt;/h3&gt;
&lt;p&gt;When we look at the specifications of the storage system, some storage system specifications specify the maximum limit for concurrent access. Users need to conduct stress testing based on their business. When there are more clients, &lt;a href="https://en.wikipedia.org/wiki/Quality_of_service"&gt;quality of service&lt;/a&gt; (QoS) management is required, including traffic control for each client and temporary read/write blocking policies.&lt;/p&gt;
&lt;p&gt;We must also note the design and supported features of CSI in Kubernetes. For example, the deployment method of the mounting process, whether it supports &lt;code&gt;ReadWriteMany&lt;/code&gt;, &lt;code&gt;subPath&lt;/code&gt; mounting, quotas, and hot updates.&lt;/p&gt;
&lt;h3&gt;Cost analysis&lt;/h3&gt;
&lt;p&gt;Cost analysis is a multifaceted concept, encompassing hardware and software procurement, often overshadowed by operational and maintenance expenses. As AI businesses scale, data volume grows significantly. Storage systems must exhibit both capacity and throughput scalability, offering ease of adjustment.&lt;/p&gt;
&lt;p&gt;In the past, the procurement and scaling of systems like Ceph, Lustre, and BeeGFS in data centers involved lengthy planning cycles. It took months for hardware to arrive, be configured, and become operational. Time costs, notably ignored, were often the most significant expenditures. &lt;strong&gt;Storage systems that enable elastic capacity and performance adjustments equate to faster time-to-market.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Another frequently underestimated cost is efficiency. In AI workflows, the data pipeline is extensive, involving multiple interactions with the storage system. Each step, from data collection, clear conversion, labeling, feature extraction, training, backtesting, to production deployment, is affected by the storage system's efficiency. &lt;/p&gt;
&lt;p&gt;However, businesses typically utilize only a fraction (often less than 20%) of the entire dataset actively. This subset of hot data demands high performance, while warm or cold data may be infrequently accessed or not accessed at all. &lt;strong&gt;It’s difficult to satisfy both requirements in systems like Ceph, Lustre, and BeeGFS.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Consequently, many teams adopt multiple storage systems to cater to diverse needs. A common strategy is to employ an &lt;a href="https://en.wikipedia.org/wiki/Object_storage"&gt;object storage&lt;/a&gt; system for archival purposes to achieve large capacity and low costs. However, object storage is not typically known for high performance, and it may handle data ingestion, preprocessing, and cleansing in the data pipeline. While this may not be the most efficient method for data preprocessing, it's often the pragmatic choice due to the sheer volume of data. Engineers then have to wait for a substantial period to transfer the data to the file storage system used for model training.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Therefore, in addition to hardware and software costs of storage systems, total cost considerations should account for time costs invested in cluster operations (including procurement and supply chain management) and time spent managing data across multiple storage systems.&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Storage system comparison&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Here's a comparative analysis of the storage products mentioned earlier for your reference:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style="text-align: left;"&gt;Category&lt;/th&gt;
&lt;th style="text-align: left;"&gt;Product&lt;/th&gt;
&lt;th style="text-align: left;"&gt;POSIX compatibility&lt;/th&gt;
&lt;th style="text-align: left;"&gt;Elastic capacity&lt;/th&gt;
&lt;th style="text-align: left;"&gt;Maximum supported file count&lt;/th&gt;
&lt;th style="text-align: left;"&gt;Performance&lt;/th&gt;
&lt;th style="text-align: left;"&gt;Cost (USD)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Amazon S3&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Partially compatible through S3FS&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Yes&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Hundreds of billions&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Medium to Low&lt;/td&gt;
&lt;td style="text-align: left;"&gt;About $0.02/GB/ month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Alluxio&lt;/td&gt;
&lt;td style="text-align: left;"&gt;&lt;a href="https://docs.alluxio.io/os/user/stable/en/api/POSIX-API.html#assumptions-and-limitations"&gt;Partial compatibility&lt;/a&gt;&lt;/td&gt;
&lt;td style="text-align: left;"&gt;N/A&lt;/td&gt;
&lt;td style="text-align: left;"&gt;&lt;a href="https://www.alluxio.io/blog/store-1-billion-files-in-alluxio-20/"&gt;1 billion&lt;/a&gt;&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Depends on cache capacity&lt;/td&gt;
&lt;td style="text-align: left;"&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;Cloud file storage service&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Amazon EFS&lt;/td&gt;
&lt;td style="text-align: left;"&gt;NFSv4.1 compatible&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Yes&lt;/td&gt;
&lt;td style="text-align: left;"&gt;N/A&lt;/td&gt;
&lt;td style="text-align: left;"&gt;&lt;a href="https://docs.aws.amazon.com/efs/latest/ug/performance.html"&gt;Depends on the data size. Throughput up to 3 GB/s, maximum 500 MB/s per client&lt;/a&gt;&lt;/td&gt;
&lt;td style="text-align: left;"&gt;&lt;a href="https://aws.amazon.com/efs/pricing/?nc1=h_ls"&gt;$0.043~0.30/GB/month&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Azure&lt;/td&gt;
&lt;td style="text-align: left;"&gt;SMB &amp;amp; NFS for Premium&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Yes&lt;/td&gt;
&lt;td style="text-align: left;"&gt;&lt;a href="https://learn.microsoft.com/en-us/azure/storage/files/storage-files-scale-targets"&gt;100 million&lt;/a&gt;&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Performance scales with data capacity. See &lt;a href="https://learn.microsoft.com/en-us/azure/storage/files/storage-files-scale-targets"&gt;details&lt;/a&gt;&lt;/td&gt;
&lt;td style="text-align: left;"&gt;&lt;a href="https://azure.microsoft.com/en-us/pricing/details/storage/files/"&gt;$0.16/GiB/month&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;&lt;/td&gt;
&lt;td style="text-align: left;"&gt;GCP Filestore&lt;/td&gt;
&lt;td style="text-align: left;"&gt;&lt;a href="https://cloud.google.com/architecture/filers-on-compute-engine#summary_of_file_server_options"&gt;NFSv3 compatible&lt;/a&gt;&lt;/td&gt;
&lt;td style="text-align: left;"&gt;&lt;a href="https://cloud.google.com/filestore?hl=en#section-12"&gt;Maxmium 63.9 TB&lt;/a&gt;&lt;/td&gt;
&lt;td style="text-align: left;"&gt;&lt;a href="https://cloud.google.com/filestore/docs/limits"&gt;Up to 67,108,864 files per 1 TiB capacity&lt;/a&gt;&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Performance scales with data capacity. See &lt;a href="https://cloud.google.com/filestore/docs/performance"&gt;details&lt;/a&gt;&lt;/td&gt;
&lt;td style="text-align: left;"&gt;&lt;a href="https://cloud.google.com/filestore/pricing?hl=zh-cn"&gt;$0.36/GiB/month&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;Lustre&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Lustre&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Compatible&lt;/td&gt;
&lt;td style="text-align: left;"&gt;No&lt;/td&gt;
&lt;td style="text-align: left;"&gt;N/A&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Depends on cluster disk count and performance&lt;/td&gt;
&lt;td style="text-align: left;"&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Amazon FSx for Lustre&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Compatible&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Manual scaling, 1,200 GiB increments&lt;/td&gt;
&lt;td style="text-align: left;"&gt;N/A&lt;/td&gt;
&lt;td style="text-align: left;"&gt;&lt;a href="https://www.amazonaws.cn/en/?nc1=h_ls"&gt;Multiple performance types of 50 MB~200 MB/s per 1 TB capacity&lt;/a&gt;&lt;/td&gt;
&lt;td style="text-align: left;"&gt;&lt;a href="https://aws.amazon.com/fsx/lustre/pricing/"&gt;$0.073~0.6/GB/month&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;GPFS&lt;/td&gt;
&lt;td style="text-align: left;"&gt;GPFS&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Compatible&lt;/td&gt;
&lt;td style="text-align: left;"&gt;No&lt;/td&gt;
&lt;td style="text-align: left;"&gt;10 billion&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Depends on cluster disk count and performance&lt;/td&gt;
&lt;td style="text-align: left;"&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;&lt;/td&gt;
&lt;td style="text-align: left;"&gt;BeeGFS&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Compatible&lt;/td&gt;
&lt;td style="text-align: left;"&gt;No&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Billions&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Depends on cluster disk count and performance&lt;/td&gt;
&lt;td style="text-align: left;"&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="text-align: left;"&gt;&lt;/td&gt;
&lt;td style="text-align: left;"&gt;&lt;a href="https://juicefs.com/docs/cloud/"&gt;JuiceFS Cloud Service&lt;/a&gt;&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Compatible&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Elastic capacity, no maximum limit&lt;/td&gt;
&lt;td style="text-align: left;"&gt;10 billion&lt;/td&gt;
&lt;td style="text-align: left;"&gt;Depends on cache capacity&lt;/td&gt;
&lt;td style="text-align: left;"&gt;JuiceFS $0.02/GiB/month + AWS S3 $0.023/GiB/month&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Over the last decade, &lt;a href="https://en.wikipedia.org/wiki/Cloud_computing"&gt;cloud computing&lt;/a&gt; has rapidly evolved. Previous-generation storage systems designed for data centers couldn't harness the advantages brought by the cloud, notably elasticity. Object storage, a newcomer, offers unparalleled scalability, availability, and cost-efficiency. Still, it exhibits limitations in AI scenarios.&lt;/p&gt;
&lt;p&gt;File storage, on the other hand, presents invaluable benefits for AI and other computational use cases. Leveraging the cloud and its infrastructure efficiently to design the next-generation file storage system is a new challenge, and this is precisely what JuiceFS has been doing over the past five years.&lt;/p&gt;
&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="https://go.juicefs.com/slack/"&gt;community on Slack&lt;/a&gt;.&lt;/p&gt;&lt;/div&gt;</description><pubDate>Thu, 18 Dec 2025 08:59:33 +0000</pubDate><guid>https://www.juicefs.com/en/blog/solutions/ai-data-storage-challenges-capabilities-solution-comparison</guid></item></channel></rss>