Use JuiceFS on AWS
Amazon Web Services (AWS) is a leading global cloud computing platform that offers a wide range of cloud computing services. With its extensive product line, AWS provides flexible options for creating and utilizing JuiceFS file systems.
Where can JuiceFS be used?
JuiceFS has a rich set of API interfaces. For AWS, JuiceFS can typically be used in the following products:
- Amazon EC2: Use by mounting the JuiceFS file system
- Amazon Elastic Kubernetes Service (EKS): Utilizing the JuiceFS CSI Driver
- Amazon EMR: Using the JuiceFS Hadoop Java SDK
Preparation
A JuiceFS file system consists of two parts:
- Object Storage: Used for data storage.
- Metadata Engine: A database used for storing metadata.
Depending on specific requirements, you can choose to use fully managed databases and S3 object storage on AWS, or deploy them on EC2 and EKS by yourself.
This article focuses on the method of creating a JuiceFS file system using AWS fully managed services. For self-hosted scenarios, please refer to the "JuiceFS Supported Metadata Engines" and "JuiceFS Supported Object Storage" guides, as well as the corresponding program documentation.
Object storage
S3 is the object storage service provided by AWS. You can create a bucket in the corresponding region as needed, or authorize the JuiceFS client to automatically create a bucket through IAM roles.
Amazon S3 provides various storage classes, for example:
- S3 Standard: Standard storage, suitable for general-purpose storage with frequent data access, offering real-time access with no retrieval costs.
- S3 Standard-IA: Infrequent Access (IA) storage, suitable for data that is accessed less frequently but needs to be stored for the long term, offering real-time access with retrieval costs.
- S3 Glacier: Archive storage, suitable for data that is rarely accessed and requires retrieval (thawing) before access.
You can set the storage class when creating or mounting the JuiceFS file system, please refer to documentation for details. It is recommended to choose the standard storage class first. Although other storage classes may have lower unit storage prices, they often come with minimum storage duration requirements and retrieval costs.
Furthermore, accessing object storage services requires authentication using Access Key (a.k.a. access key ID) and Secret Key (a.k.a. secret access key). You can refer to the document "Managing access keys for IAM users" for creating the necessary policies. When accessing S3 from an EC2 cloud server, you can also assign an IAM role to the EC2 instance to enable the S3 API to be called without using access keys.
Database
AWS offers various network-based fully managed databases that can be used to build the JuiceFS metadata engine, mainly including:
- Amazon MemoryDB for Redis (hereinafter referred to as MemoryDB): A durable Redis in-memory database service that provides extremely fast performance.
- Amazon RDS: Fully managed databases such as MariaDB, MySQL, PostgreSQL, and more.
Although Amazon ElastiCache for Redis (hereinafter referred to as ElastiCache) also provides services compatible with the Redis protocol, compared with MemoryDB, ElastiCache cannot provide "strong consistency guarantee", so MemoryDB is recommended.
Using JuiceFS on EC2
Installing the JuiceFS client
Please refer to the Installation documentation to install the latest JuiceFS Community Edition client based on the operating system used by your EC2 instance.
For example, if you are using a Linux system, you can use the one-liner installation script to automatically install the client:
curl -sSL https://d.juicefs.com/install | sh -
Creating a File System
Preparing object storage
You can assign an IAM role with AmazonS3FullAccess permission to your EC2 instance, allowing it to create and use S3 Buckets directly without using Access Key and Secret Key.
Preparing the database
Here we take MemoryDB as an example, please refer to "Redis Best Practices" and AWS documentation to create a database.
In order to allow EC2 instances to access the Redis cluster, you need to create them in the same VPC or add rules to the security group of the Redis cluster to allow access from the EC2 instance.
If you are creating a Redis 7.0 version cluster, you will need to install JuiceFS version 1.1 or above on the client side.
Formatting file system
juicefs format --storage s3 \
--bucket https://s3.ap-east-1.amazonaws.com/myjfs \
rediss://clustercfg.myredis.hc79sw.memorydb.ap-east-1.amazonaws.com:6379/1 \
myjfs
Mounting file system
sudo juicefs mount -d \
rediss://clustercfg.myredis.hc79sw.memorydb.ap-east-1.amazonaws.com:6379/1 \
/mnt/myjfs
To mount and use the file system created by authorizing S3 access through an IAM role from outside of AWS, you will need to use juicefs config
to add the Access Key and Secret Key for the file system.
juicefs config \
--access-key=<your-access-key> \
--secret-key=<your-secret-key> \
rediss://clustercfg.myredis.hc79sw.memorydb.ap-east-1.amazonaws.com:6379/1
Mounting at boot
Please refer to the document Mount JuiceFS at Boot for details on how to automatically mount JuiceFS at boot.
Using JuiceFS on Amazon EKS
Amazon EKS supports three types of node:
- EKS managed node groups: Use Amazon EC2 as compute nodes
- Self-managed nodes: Use Amazon EC2 as compute nodes
- Fargate: A serverless compute engine
JuiceFS CSI Driver is not currently supported on Fargate. Please create a cluster using "EKS managed node groups" or "self-managed nodes" to use JuiceFS CSI Driver.
Amazon EKS is a standard Kubernetes cluster and can be managed using tools such as eksctl
, kubectl
, and helm
. For installation and usage instructions, please refer to the JuiceFS CSI Driver documentation.
Using JuiceFS on Amazon EMR
Please refer to the document "Using JuiceFS in Hadoop Ecosystem" for instructions.