Transfer file from Azure Blob Storage to Google Cloud Storage programmatically

安稳与你 提交于 2019-12-01 08:32:56

Did you think about using Azure Data Factory custom activity support that is used for data transformation? On back-end, you can use Azure Batch for downloading, updating and uploading your files into Google Storage, if you go with ADF custom activity.

We have migrated about 3TB files from Azure to Google Storage. We have started a cheap Linux server with a few TB local disk in the Google Computing Engine. Transferred the the Azure files to the local disk by blobxfer, then copied the files from the local disk to the Google Storage by gsutil rsync (gsutil cp works too).

You can use other tools to transfer files from Azure, you may even start the Windows server in the GCE and use gsutils on Windows.

It has taken a few days, but was simple and straightforward.

I know it's a bit late to answer this question for you, but it might help others who all are trying to migrate data from Azure Blob Storage to Google Cloud Storage

Google Cloud Storage and Azure Blob Storage, both platforms being storage services, does not have a command line interface, where we can simply go and run transfer commands. For that, we need an intermediate compute instance which would actually be able to run the required commands. We will follow the steps below in order to achieve the Cloud to Cloud transfer.

First and foremost, create a Compute Instance in Google Cloud Platform. You needn't create a computationally powerful instance, all you need is a Debian-10GB machine with 2-core CPU and 4 GB of memory.

In the early days, you would have downloaded the data to the Compute Instance in GCP and then move it further to Google Cloud Storage. But now with the introduction of gcsfuse we can simply mount a Google Storage Account as a File System.

Once the compute instance is created, simply login to that instance using SSH from Google Console and install the following packages.

Install Google Cloud Storage Fuse

export GCSFUSE_REPO=gcsfuse-`lsb_release -c -s`
echo "deb http://packages.cloud.google.com/apt $GCSFUSE_REPO main" | sudo tee /etc/apt/sources.list.d/gcsfuse.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

sudo apt-get update -y
sudo apt-get install gcsfuse -y

# Create local folder 
mkdir local_folder_name

# Mount the Storage Account as a bucket
gcsfuse <bucket_name> <local_folder_path>

Install Azcopy

wget https://aka.ms/downloadazcopy-v10-linux
tar -xvf downloadazcopy-v10-linux
sudo cp ./azcopy_linux_amd64_*/azcopy /usr/bin/

Once these packages are installed, the next step is to create the Shared Signature Access key. If you have Azure Blob Storage Explorer, just right click on the storage account name in the directory tree and Select Generate Shared Access Signature

Now you will have to create a URL to your blob objects. To achieve this, simply right-click on any of your blob object, select Properties and copy the URL from the dialogue box.

Your final Url should look like.

<https://URL_to_file> + <SAS Token>

https://myaccount.blob.core.windows.net/sascontainer/sasblob.txt?sv=2015-04-05&st=2015-04-29T22%3A18%3A26Z&se=2015-04-30T02%3A23%3A26Z&sr=b&sp=rw&sip=168.1.5.60-168.1.5.70&spr=https&sig=Z%2FRHIX5Xcg0Mq2rqI3OlWTjEg2tYkboXr1P9ZUXDtkk%3D

Now, use following command to start copying the files from Azure to GCP storage.

azcopy cp --recursive=true "<-source url->" "<-destination url->"

If in case, your job fails you can list your jobs using:

azcopy jobs list

and to resume failed jobs:

azcopy jobs resume jobid <-source sas->

You can collate all the steps into one bash, leave it running till your data transfer is complete.

And that's all! I hope it help others

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!