Skip to main content

Data replication

JuiceFS supports cross-cloud and cross-region data replication, which can replicate data asynchronously to object storage service in another region or another cloud service provider.

For the convenience of description, we will refer to the Object Storage Region specified when creating the file system as the "primary region". To enable data replication, you need to specify a "target region" for the file system. It can be the same or different cloud service provider as the primary region.

Featured scenarios

This feature applies to the following scenarios:

  • Cross-region data sharing: If file system needs to be accessed in two different regions, consider using the replication feature to copy data to another region, in real time (asynchronously), to boost performance in the remote region. Keep in mind that replication only helps with object storage performance, metadata performance will mostly depend on network latency, if you also need excellent metadata performance, consider using a mirror file system.
  • Object storage replication for mirror volumes: When using a mirror file system, a replicated object storage bucket in the local region can improve performance. However, you should enable replication in the mirror file system and not the main file system, learn why in the below section.
  • Object storage service disaster recovery: If the object storage service in the primary region fails, you can manually switch to the object storage in the target region (via --bucket option, and then restart all JuiceFS clients), to restore service in a short period of time.
  • Seamless migration object storage service: If you need to change the underlying object storage service of a certain file system, you can use this feature to write to two buckets at the same time, achieving a seamless migration.

For now, data replication is one-to-one, one-to-many is not supported. If you do need to replicate data into multiple regions, use juicefs sync instead, or create multiple mirror regions to achieve one-to-many replication.

How it works

Taking the primary region write and the target region read as an example, the data copy works as shown below:

replication

As shown in the above diagram, clients across regions all access the same metadata service (both read and write).

  • For writing, data is preferentially written to the object storage of the current region. After successful, the data is then asynchronously copied to the remote object storage.
  • For reading, data is preferentially read from the object storage of the current region. If it does not exist (not yet synchronized), it will be read from the object storage in the remote region. Performance will be affected in poor network conditions, tune cache config according to your use case and see if it helps.

With a mirror file system, data replication mechanism is different (as shown in below diagram): mirror clients subscribe to metadata changes, and then carry out replication via background job. In comparison, when replication is enabled on a non-mirror file system, replication happens immediately and asychronously when the client initiates a write request. Due to this difference, if a mirror file system is not replicating data fast enough, you can increase the number of mount points to accerate.

mirror-replication

Since there must be a source file system before a mirror is created, when you need to use replication in a mirror scenario, you should enable replication in the mirror side instead of the source side, due to these considerations:

  • If replication was to be enabled on the source's side, then in order to replicate all changes, mount options must be adjusted globally (add --access-key2 and --secret-key2 and restart). For production environments this is often inconvenient and costly.
  • A file system can build mirrors in multiple regions, forming a one-to-many relationship. But replication is always one-to-one, so you simply cannot replicate to multiple regions by using replication solely in the source file system. This also forces replication to be used on the mirror's side in order to achieve one-to-many replications.

Enable data replication

Open JuiceFS Console, navigate to the volume settings page and click "Enable replication", select a target cloud service and region, save the settings. After that, since replication is executed asynchronously with write requests, all clients must authenticate again and restrart to take effect, if any client that writes data did not manage to restart with the new settings, its changes will not be replicated.

The needed commands are:

# Adjust authenticate options and add the credentials for the 2nd bucket
juicefs auth myjfs --access-key=xxx --secret-key=xxx --access-key2=xxx --secret-key2=xxx

# Restart seamlessly to take effect
juicefs mount myjfs /jfs

The process is the same with CSI Drivers, just change the file system credentials accordingly and new Mount Pods will run with replication.

Data consistency

Both regions use the same Metadata Service so there is no metadata inconsistencies. However the target region is operating with larger latency, and could face worse performance, consider using mirror file system to improve metadata performance.

As for object storage data consistency, keep in mind that blocks managed by JuiceFS are immutable, and with replication enabled, the mirror metadata service will watch for all data modifications in the Raft changelog, and dispatch data synchronization tasks to the clients (in the form of background jobs), client will then pull data from the source object storage, and upload to the target object storage.

Apart from real-time, incremental synchronization, client will periodically (default to weekly) carry out full, bidirectional synchronization.

Billing Notes

Data replication is available to all users for free. When enabled, no additional metadata is generated, so replication has no impact on JuiceFS billing. You only need to pay attention to your object storage service provider's billing.