Skip to main content

File Import and Conversion

By default, JuiceFS stores files in blocks and separates metadata from data. This storage format and separation architecture enable JuiceFS to be a high-performance and strongly consistent file system.

However, in some rare scenarios, users prefer to store original files directly in object storage, allowing the files in object storage to be separated from JuiceFS metadata usage. Alternatively, they may want to directly import a large number of existing files from object storage into JuiceFS, enabling them to be accessed via POSIX and benefit from JuiceFS' powerful caching capabilities. Storing complete files in object storage and using them in JuiceFS is referred to as the "compatible format," distinguishing it from the default "optimized format."

Starting from version 5.0, JuiceFS has significantly improved support for the compatible format, providing the following features to meet the aforementioned requirements:

  • The import feature for object storage, also known as the juicefs import command. This command has been available for some time, but starting from version 5.0, imported files support read caching as well.
  • The convert feature, which reassembles optimized-format blocks in JuiceFS back to original files and uploads them to object storage. This allows you to directly access the original files in object storage, with caching support.

Import existing object storage files

juicefs import scans the given object storage address and writes the metadata information of the target file into JuiceFS' metadata engine, allowing these files to be accessed in JuiceFS. This operation does not actually copy any files; the files remain as they are in object storage. Therefore, this storage format is called the compatible format, meaning it is compatible with object storage.

When you use imported files, please note:

  • Imported files also occupy file system space, contribute to directory quotas, and are included in billing.
  • You can modify file names and permissions, but you cannot modify the object storage data. In other words, no matter what operation you perform, the original objects in object storage will remain unchanged.
  • Deleting these files will only delete their metadata and will not actually delete the source files in object storage.
  • The imported files' metadata in JuiceFS does not support the trash feature. If you delete imported files in JuiceFS, you will not find them in the trash. If you need to recover them, you can only re-import them.
  • Files imported into JuiceFS cannot be easily distinguished from regular files. If you need to check, use the juicefs info command and focus on the object field (rather than a chunks table) to determine whether it is stored in compatible format.

Cache for import

Starting from JuiceFS 5.0, imported files also support local cache and distributed cache. Although imported files are not actually written to the JuiceFS file system and do not go through JuiceFS' sharded formatting process, when cached to the local disk, they are still split into data blocks (the size is the file system's block size). Therefore, the usage and management of cache for imported files are no different from normal files written to the JuiceFS file system.

tip

Caching isn't supported for external buckets. This means in order to have caching support, the JuiceFS volume and import source must be the same bucket, and execute the import command in the following format:

# URI doesn't include bucket name, caching is supported
juicefs import / /jfs/imported
juicefs import /prefix /jfs
# If URI contains bucket, caching is no longer supported
# In the below example, even if BUCKET is the same as the JuiceFS volume bucket, caching will not be available
juicefs import BUCKET/prefix /jfs

When you use JuiceFS' cache feature to speed up the reading of imported files, it is important to note consistency issues: since the imported objects themselves are not managed by JuiceFS, if the objects are modified without being re-imported into JuiceFS, old versions of the cache may exist, and there is no guarantee that the latest data can be read. Therefore, if changes occur in the objects after they are imported into JuiceFS, they need to be re-imported. Existing cache data will automatically become invalid based on the modification time of the imported objects. This makes sure you can read the modified data.

For objects that need to be modified repeatedly, it is recommended to migrate the data to JuiceFS as a whole, using juicefs sync to write data to JuiceFS. Because of JuiceFS' POSIX compatibility, you can use any other tool as well.

Observation

Depending on how you actually use JuiceFS, a file system can contain files written in native JuiceFS sharding format, and also files that are directly imported from object storage. Ideally you should manage them in different directories in order not to confuse, however, you can also run the juicefs info command to tell them apart:

Files written in native JuiceFS format is stored in blocks, so the object field is a table that lists all the associating blocks, all under the chunks directory, like this:

$ juicefs info a
a :
inode: 51
files: 1
dirs: 0
length: 2 Bytes
size: 4.00 KiB (4096 Bytes)
path: /a
objects:
+------------+-------------------------------+------+--------+--------+
| chunkIndex | objectName | size | offset | length |
+------------+-------------------------------+------+--------+--------+
| 0 | poc/chunks/B5/54/54450357_0_2 | 2 | 0 | 2 |
...
+------------+-------------------------------+------+--------+--------+

If the file is imported directly from the object storage, then the object field won't have any table, just a single object storage file path:

$ juicefs info b
b :
inode: 26
files: 1
dirs: 0
length: 36 Bytes
size: 4.00 KiB (4096 Bytes)
path: /imported/b
object: /mybucket/b
mtime: 2025-02-12 14:52:19 +0800 CST

Import on demand

Starting from 5.1.3, JuiceFS supports import-on-demand as an experimental feature, built on top of the above import feature, import-on-demand maps the entire object storage bucket as a JuiceFS file system, the difference is that it only scans object storage and build metadata when a directory is accessed. This feature offers the following benefit compared to a simple periodical import:

  • Significantly relieve object storage list pressure because it only happens when accessed;
  • Metadata is generated only when a directory is accessed in JuiceFS. Only the accessed data is billed in JuiceFS. Unaccessed data is not scanned in the object storage bucket and will not be billed.
  • When an object storage bucket is too large, doing a full import can put stress on the node’s memory. You can use on-demand import to access the whole bucket while minimizing memory overhead.
warning

This is still an experimental feature, which means its usage, even design, is subject to change in the future. If you'd like to evaluate, do contact a Juicedata engineer and discuss the process with us.

Synopsis

Import-on-demand can only access objects inside the linked bucket, so you must associate the source bucket with the file system right from the beginning (via the JuiceFS Web Console), only then can you mount the file system on a server.

In the following command, --source=/ means mapping the object storage bucket root onto the file system root, currently this argument is fixed and you cannot pass values other than /.

juicefs mount myjfs /jfs --source=/

After a successful mount, run ls to list the top-level directories:

$ cd /jfs
$ ls -alh
...
drwxrwxrwx 2 root root 4.0K Nov 11 17:35 dir1
drwxrwxrwx 2 root root 4.0K Nov 11 17:35 dir2
drwxrwxrwx 2 root root 4.0K Nov 11 17:35 dir3

Note the above 777 permission, in an import-on-demand file system, permission is attached with special meaning: a 777 directory means that it hasn't been accessed, so there's no metadata underneath. But if this directory is accessed by any means, the JuiceFS Client scans the object storage and quickly generates the corresponding directory structure. After that, directory permission changes:

# Access dir1 by any means (cd, ls, or just read directly)
$ ls dir1/file.txt

# After the directory has been accessed, permission changes from 777 to 555
$ ls -alh
...
dr-xr-xr-x 5 root root 16K Nov 11 17:47 dir1
drwxrwxrwx 2 root root 4.0K Nov 11 17:35 dir2
drwxrwxrwx 2 root root 4.0K Nov 11 17:35 dir3

Hence, in an import-on-demand workflow, metadata load state is decided by directory permission: 777 means metadata is not yet loaded, others mean metadata is already loaded, and is available for read or cache warmup. It's important that you do not tamper with directory permissions.

Similar to the above "import" feature, you cannot modify an object storage file via JuiceFS, and even if you delete them from your JuiceFS volume, only the metadata index is deleted, which has no effect on the object storage bucket.

But as a matter of fact, an import-on-demand file system can still write new files just like any other JuiceFS file system, however, the file created are not uploaded directly to object storage as-is, but splitted and written to object storage in standard JuiceFS' sharding format, under the /[volume-name]/chunks/ prefix in the object storage bucket. So if you do create new files in an import-on-demand file system, there'll be two types of file inside the bucket: the original bucket objects, and those splitted files written via JuiceFS.

Due to the management difficulties, we do not recommend that you write to an import-on-demand file system, it can get confusing to distinguish between imported objects and files written natively, a worse situation is when metadata is expired, the next scan will rebuild the metadata and simply erase any excessive files, which means all written files will be deleted.

Metadata expiration

After import, objects in the object storage may still be added, deleted, or modified. You can control how long the imported metadata remains valid with the --refresh-interval parameter; it defaults to 1 minute (--refresh-interval=1m). In practice, this means that if a directory was last accessed over one minute ago, running ls again may experience a pause while JuiceFS rescans the corresponding object storage path and rebuilds metadata.

The default 1 minute expiration can frequently block normal file system access. You can increase the validity period to trade freshness for a smoother experience:

juicefs mount myjfs /jfs --source=/ --refresh-interval=1h

With this setting, imported metadata expires after one hour. Accessing the directory after expiration (for example running ls or looking up a non-existent file) will trigger a metadata refresh. When multiple clients attempt scanning at the same time, a file lock ensures only one client performs the scan to avoid repeated LIST pressure on the object storage. Depending on the amount of data and the object storage list performance, the scan can still cause file system requests to hang, until metadata is rebuilt. Therefore, users should evaluate the bucket's update frequency and set the refresh interval according to their application's freshness requirements.

If your workload is sensitive to access latency and cannot tolerate the pause on the first access after expiration, you can enable asynchronous refresh: accesses will trigger a refresh in the background and do not wait for the rebuild to complete.

juicefs mount myjfs /jfs --source=/ --refresh-interval=1h --refresh-in-background

Note that when asynchronous refresh is enabled, the freshness guarantees defined by --refresh-interval no longer apply. For example, with the one-hour setting above and asynchronous refresh enabled, an expired directory may not immediately show files added within the last hour when running ls. Only synchronous refresh mode can guarantee that the newly added files are visible immediately after the refresh interval elapses.

Billing (Import-on-demand pricing)

Import-on-demand uses a true "pay-as-you-use" billing model: although the entire object storage bucket is mapped into the file system, only the parts for which JuiceFS has built metadata are counted toward file system usage and billed.

For example, if a bucket contains 1 TB of data spread across multiple subdirectories (more precisely: prefixes, since object storage has no directory concept) but only a 50 GB directory is actually accessed, the file system usage is 50 GB and billing is based on that 50 GB.

If those 50 GB of data are no longer needed, deleting the corresponding subdirectory from the file system immediately stops billing for it.

Deleting the directory does not change anything in the object storage bucket; it only makes the directory invisible in JuiceFS. To restore the directory in the file system, wait for the metadata to expire and ls its parent directory to trigger a new scan and rebuild. After rebuilding, the directory will return to the initial 777 state, which does not consume file system space, and will not be billed. Only when the directory is actually opened will the objects inside be scanned and metadata rebuilt.

In summary, for import-on-demand file systems, you can safely delete data you don't need without affecting the object storage bucket. If you later need that data again, wait for the metadata to expire and then access the path again, JuiceFS will rebuild the metadata.

Cache and warmup

Running the juicefs warmup command builds cache only for the imported data, i.e. directories with 555 permission. If juicefs warmup is run against a 777 directory, then nothing happens because metadata isn't imported yet. This design ensures that cache is also built on demand, so users must access the desired data in advance, and build metadata for the directories.

To demonstrate:

$ ls -alh
...
dr-xr-xr-x 5 root root 16K Nov 11 17:47 dir1
drwxrwxrwx 2 root root 4.0K Nov 11 17:35 dir2
drwxrwxrwx 2 root root 4.0K Nov 11 17:35 dir3

# Warming up a 777 directory does nothing
$ juicefs warmup dir3

# Only accessed directories can be warmed up
$ juicefs warmup dir1

Similar to the import command, files imported on demand supports local caching and distributed caching as well, read the relevant section to learn more.

Convert

When the convert feature is enabled, files are converted into complete files and stored in object storage after a specified period.

convert

Typical use cases

  • Files are initially written by JuiceFS, but later needed to be directly accessed from object storage to integrate with cloud ecosystems. The emphasis is on directly access here, because JuiceFS itself does provide S3 API through S3 Gateway, if you just need to provide S3 API for your file system, use our S3 Gateway instead.

  • Utilize the archiving capabilities of object storage to archive cold data. The archived data can be taken out and accessed without JuiceFS metadata.

    Still, the emphasis is on use without JuiceFS metadata, because JuiceFS natively supports separation of cold and hot data, simply use --storage-class to specify a storage class, which is much simpler.

  • Compliance with data regulations that require files to be stored in its original intact format.

  • Other scenarios that demand data be separated from JuiceFS metadata, and can be taken out to use without JuiceFS.

Forbidden use cases

Convert is a experimental feature designed for some very special occasions, if your use case isn't listed above, you should never use this feature because it poses some important limitations on the file system (for example, converted files are read-only), continue to the below section to learn more.

These are some of the cases that should not (but can be easily mistaken) be used with the convert feature:

  • You need a S3 endpoint to access your JuiceFS file system. Our S3 Gateway is specifically built for this type of use, and shouldn't involve the convert feature at all.
  • Separate hot/cold data. JuiceFS Client can specify a storage class (via --storage-class) during juicefs auth, so that different clients can handle files destined for different storage classes.

Synopsis

The effect of conversion on the object storage file list is as follows:

# Before conversion
mybucket/
├── chunks
│ ├── 41
│ │ └── 1
│ │ ├── 1000001_0_4194304
│ │ └── 1000001_10_4194304
│ ├── 43
│ │ └── 1
│ │ ├── 1000003_0_4194304
...

# After conversion, files are written into object storage as is, preserving the directory structure. The original sharded-format data blocks are deleted.
mybucket/
├── bigfile1.tar.gz
├── chunks
├── dir/bigfile2.tar.gz

Because the purpose of conversion is to decouple from the JuiceFS sharded format and store data as is in object storage, files are stored according to the directory structure in the file system. Therefore, if the convert feature is enabled, the file system must exclusively occupy the object storage bucket. To avoid conflicts and potential data loss, it should not be used for multiple purposes or other JuiceFS file systems.

After conversion, files no longer support content modifications, and write operations will result in permission errors. While they cannot be edited, they can be moved using the mv command. In JuiceFS, this command is interpreted as a "cross-device copy + delete." It reads the file normally from the compatible format, writes it back to JuiceFS in sharded format as a new file, and delete the original file. As the mv command converts the file from the compatible format back to the sharded format, the file can be edited again until the specified time has passed, at which point it can be converted again.

For file systems with the convert feature enabled, directories created a while after their creation (including empty directories) cannot be moved (mv). They can only be deleted and then recreated.

warning

Converted files do not support the trash feature. Once deleted, they do not appear in the trash, and they cannot be recovered. The object storage side will also perform asynchronous cleanup through client background tasks.

The convert feature is currently a beta feature, if you want to evaluate, contact a Juicedata engineer and we'll enable this feature for your file system, once enabled, you can navigate to the settings page and adjust relevant settings there.

Cache for conversion

Files in the compatible format support local cache and distributed cache. Even though converted files are no longer in sharded format, when cached to local storage, they are still split into data blocks (sized according to the file system's block size). Therefore, the usage and management of caching for converted files is no different from sharded-format files.

However, it is important to note that after conversion, existing cache is invalidated due to metadata changes, and the files need to be warmed up again to reestablish local cache.