Customers using AWS can transfer data using DataSync. This section explains step-by-step in detail how to use DataSync to transfer data from Moloco Storage to your AWS S3 bucket.

1. What is AWS DataSync?

AWS DataSync is an online data movement and discovery service that simplifies data migration and helps us quickly, easily, and securely transfer files or object data to, from, and between AWS storage services. It works with not only On-premises storage systems and AWS storage services such as S3, EFS, etc. but also cloud storage services such as Google Cloud Storage (GCS), Microsoft Azure Blob Storage, etc.

2. Overview

Moloco exports and stores customer data in GCS, so this guide shows how you can use AWS DataSync to migrate customer data stored in a GCS bucket using DataSync.

Because DataSync integrates with the Google Cloud Storage XML API, you can copy objects into Amazon S3 without writing code. The DataSync agent can be deployed on Google Cloud or AWS, and here we will guide you on how to place it on AWS.

The following diagram illustrates the transfer.

Untitled
  1. You deploy a DataSync agent in a virtual private cloud (VPC) in your AWS environment.
  2. The agent reads your Google Cloud Storage bucket by using a Hash-based Message Authentication Code (HMAC) key.
  3. The objects from your Google Cloud Storage bucket move securely through TLS 1.3 into the AWS Cloud by using a private VPC endpoint.
  4. The DataSync service writes the data to your S3 bucket.

3. Prerequisites

Before you begin, do the following if you haven’t already:

4. Steps to create a DataSync task

Step 1: Creating an HMAC key for your Google Cloud Storage bucket

DataSync uses an HMAC key that's associated with a Google service account to authenticate with. To get HMAC key’s access ID and secret, please contact Moloco Customer Support.

Step 2: Configure your network

You need a VPC with an interface endpoint. DataSync uses the VPC endpoint to facilitate the transfer.

To configure your network for a VPC endpoint

  1. If you don't have one, create a VPC in the same AWS Region as your S3 bucket.
  2. Create a private subnet for your VPC.
  3. Create a VPC endpoint for DataSync by using AWS PrivateLink.
  4. Configure your network to allow DataSync transfers through a VPC endpoint. To make the necessary configuration changes, you can modify the security group that's associated with your VPC endpoint. DataSync requires the following ports for your agent to use a VPC endpoint.
FromToProtocolPort
Your web browserYour DataSync agentTCP80 (HTTP)
DataSync agentYour DataSync VPC endpoint
To find the correct IP address, open the https://console.aws.amazon.com/vpc/, and choose Endpoints from the left navigation pane. Choose the DataSync endpoint, and check the Subnets list to find the private IP address that corresponds to the subnet that you chose for your VPC endpoint setup.
For more information, see step 5 in https://docs.aws.amazon.com/datasync/latest/userguide/datasync-in-vpc.html#create-agent-steps-vpc.
TCP1024–1064
DataSync agentYour task's network interfaces
To find the related IP addresses, open the Amazon EC2 console and choose Network Interfaces from the left navigation pane. To see the four network interfaces for the task, enter your task ID in the search filter.
For more information, see step 9 in https://docs.aws.amazon.com/datasync/latest/userguide/datasync-in-vpc.html#create-agent-steps-vpc.
TCP443 (HTTPS)
DataSync agentYour DataSync VPC endpointTCP22 (Support channel)

Step 3: Create a DataSync Agent

You need a DataSync agent that can access and read your Google Cloud Storage bucket. The agent runs as an Amazon EC2 instance in a VPC that's associated with your AWS account.

To create an agent,

  1. Open the AWS DataSync console at https://console.aws.amazon.com/datasync/.
  2. In the left navigation pane, choose Agents, and then choose Create agent.
  3. And then follow the detailed steps for each section

Step 3-1: Deploy agent

Choose Amazon EC2 and follow the steps below.
Untitled

📘

Info

You can also refer to this guide provided by Amazon - Deploy your agent on Amazon EC2

  1. Open a terminal. Make sure to configure your AWS CLI profile to use the account that's associated with your S3 bucket.
  2. Copy the following command. Replace vpc-region with the AWS Region where your VPC resides (for example, us-east-1).
aws ssm get-parameter --name /aws/service/datasync/ami --region vpc-region
  1. Run the command. In the output, take note of the "Value" property. This value is the DataSync Amazon Machine Image (AMI) ID of the Region that you specified. For example, an AMI ID could look like ami-1234567890abcdef0.
  2. Copy the following URL. Again, replace vpc-region with the AWS Region where your VPC resides. Then, replace ami-id with the AMI ID that you noted in the previous step.
https://console.aws.amazon.com/ec2/v2/home?region=vpc-region#LaunchInstanceWizard:ami=ami-id
  1. Paste the URL into a browser. The Amazon EC2 instance launch page in the AWS Management Console displays.
  2. For Instance type, choose one of the following instance types.
    1. m5.2xlarge: For task executions working with up to 20 million files, objects, or directories.
    2. m5.4xlarge: For task executions working with more than 20 million files, objects, or directories.
  3. For Key pair, choose an existing key pair, or create a new one.
  4. For Network settings, choose the VPC and subnet where you want to deploy the agent.
  5. Choose Launch instance.

Step 3-2: Choose service endpoint

📘

Info

You can also refer to this guide provided by Amazon - Choose a service endpoint for your AWS DataSync agent

  1. On the same Create agent page, go to the Service endpoint section.
  2. Choose VPC endpoints using AWS PrivateLink
  3. Choose the VPN endpoint configured in Step 2.
  4. Choose Subnet and Security group configured in Step 2 accordingly.
Untitled

Step 3-3: Activate your agent

📘

Info

You can also refer to this guide provided by Amazon - Activate your AWS DataSync agent

  1. On the same Create agent page, go to the Activation key section.

    Untitled
  2. Choose Automatically get the activation key from your agent to activate your agent:

    • Automatically get the activation key from your agent – This option requires that your browser access the agent by using port 80. Once activated, the agent closes the port.
      • For Agent address, enter the agent's public IP address or domain name and choose Get key.
      • Your browser connects to the IP address and gets a unique activation key from your agent. If the activation fails, check your network configuration.

    If you don't want a connection between your browser and agent use Manually enter your agent's activation key option

    • Manually enter your agent's activation key
      • Get the key from the agent's local console.
      • Back in the DataSync console, enter the key in the Activation key field.
      1. 📘

        Info

        Agent activation keys expire in 30 minutes if unused.

  3. (Optional) For Agent name, enter a name for your agent.

  4. (Optional) For Tags, enter values for the Key and Value fields to tag your agent. Tags help you manage, filter, and search for your AWS resources.

  5. Choose Create agent.

  6. On the Agents page, verify that your service endpoint is correct.

Step 4: Create a DataSync source location for Moloco MCM Google Cloud Storage bucket

To set up a DataSync location for your Google Cloud Storage bucket, you need the access ID and secret for the HMAC key that you created in Step 1.
To create the DataSync source location

  1. Open the AWS DataSync console at https://console.aws.amazon.com/datasync/.
  2. In the left navigation pane, expand Data transfer, then choose Locations and Create location.
  3. For Location type, choose Object storage.
  4. For Agents, choose the agent that you created in Step 3.
  5. For Server, enter storage.googleapis.com.
  6. For Bucket name, enter the name of your Google Cloud Storage bucket.
  7. Expand Additional settings. For Server protocol, choose HTTPS. For Server port, choose 443.
  8. Scroll down to the Authentication section. Make sure that the Requires credentials check box is selected, and then do the following:
    • For Access key, enter your HMAC key's access ID.
    • For Secret key, enter your HMAC key's secret.
  9. Choose Create location.

Step 5: Create a DataSync destination location for your S3 bucket

You need a DataSync location for where you want your data to end up.
To create the DataSync destination location

  1. Open the AWS DataSync console at https://console.aws.amazon.com/datasync/.
  2. In the left navigation pane, expand Data transfer, then choose Locations and Create location.
  3. Create a DataSync location for the S3 bucket.
    If you deployed the DataSync agent in your VPC, this tutorial assumes that the S3 bucket is in the same AWS Region as your VPC and DataSync agent.

Step 6: Create and start a DataSync task

With your source and destinations locations configured, you can start moving your data into AWS.
To create and start the DataSync task

  1. Open the AWS DataSync console at https://console.aws.amazon.com/datasync/.

  2. In the left navigation pane, expand Data transfer, then choose Tasks, and then choose Create task.

  3. On the Configure source location page, do the following:

    1. Choose Choose an existing location.
    2. Choose the source location that you created in Step 4, then choose Next.
  4. On the Configure destination location page, do the following:

    1. Choose Choose an existing location.
    2. Choose the destination location that you created in Step 5, then choose Next.
  5. On the Configure settings page, do the following:

    1. Under Data transfer configuration, expand Additional settings and clear the Copy object tags check box.
    2. 🚧

      Important

      Because DataSync communicates with Google Cloud Storage by using the Amazon S3 API, there's a limitation that might cause your DataSync task to fail if you try to copy object tags.

    3. Configure any other task settings that you want, and then choose Next.
  6. On the Review page, review your settings, and then choose Create task.

  7. On the task's details page, choose Start, and then choose one of the following:

    1. To run the task without modification, choose Start with defaults.
    2. To modify the task before running it, choose Start with overriding options.

When your task finishes, you'll see the objects from your Google Cloud Storage bucket in your S3 bucket.

5. References