Using AWS Lambda to archive specific files with AWS S3

  • Tutorial
image

AWS Lambda is a computing service that runs your code on certain events and automatically manages your computing resources. An overview of the concept, principles of work, prices and the like is already on the hub ( habrahabr.ru/company/epam_systems/blog/245949 ), but I’ll try to show a practical example of using this service.

So, as the name of the post implies, we will use AWS Lambda to create an archive of the files we specified stored on AWS S3. Go!



Create a new feature in the AWS console


AWS Lambda is in the “preview” stage, so if you are using it for the first time, you will need to fill out a request and wait a few days for access.

To create a new function in the AWS console, click on the Create a Lambda function button and get to the form for setting the parameters of the new function.
First of all, we are asked the name and description of the new function:
image

Then its code:
image
The code can either be written directly to the editor or downloaded a specially prepared zip archive. The first option is suitable only for code without additional dependencies, and the second one at the time of this writing was not working through the web. Therefore, at this stage, we create a function with the code from the editor without changing the text of the proposed example, and later we load the code we need with all the dependencies programmatically.

The Role name determines what access rights to various AWS objects the function will have. I will not focus on this right now, I’ll just say that the rights offered by default when creating a new role provide access to AWS S3 and therefore are sufficient for this example.

You must also specify the allocated amount of memory and the execution timeout:
image
The allocated amount of memory affects the price of the function (the more, the more expensive). However, the allocated processor resources are also tied to it. Since the task of creating an archive is highly dependent on processor resources, we select the maximum available amount of memory - the price increase is completely compensated by the reduction in processing time.

After filling out the form, click Create Lambda function and leave the AWS console in order to proceed to the direct creation of our function.

Feature code, and packing and unloading in AWS


To solve our problem, we will use several third-party libraries, as well as the grunt-aws-lambda library for the convenience of developing, packaging and unloading a finished function.

Create packaje.json as follows:
{
  "name": "zip-s3",
  "description": "AWS Lamda Function",
  "version": "0.0.1",
  "private": "true",
  "devDependencies": {
    "aws-sdk": "^2.1.4",
    "grunt": "^0.4.5",
    "grunt-aws-lambda": "^0.3.0"
  },
  "dependencies": {
    "promise": "^6.0.1",
    "s3-upload-stream": "^1.0.7",
    "archiver": "^0.13.1"
  },
  "bundledDependencies": [
    "promise",
    "s3-upload-stream",
    "archiver"
  ]
}

and install the dependencies:
npm install

The bundledDependencies archive in package.json contains dependencies that will be packaged with our function when unloaded.

After that we create the index.js file in which the function code will be located.
First, let's see what the code of a function that does nothing does looks like:
exports.handler = function (data, context) {
    context.done(null, '');
}

A call to context.done signals that the function has completed, while AWS Lambda stops its execution, considers the time used, etc.

The data object contains the parameters passed to the function. The structure of this object will have the following form:
{
    bucket : 'from-bucket', 
    keys : ['/123/test.txt', '/456/test2.txt'],
    outputBucket : 'to-bucket',
    outputKey : 'result.zip'
}

Let's start writing the function code itself.
We connect the necessary libraries:
var AWS = require('aws-sdk');
var Promise = require('promise');
var s3Stream = require('s3-upload-stream')(new AWS.S3());
var archiver = require('archiver');
var s3 = new AWS.S3();

We create objects that will archive files and stream download the resulting archive to AWS S3.
var archive = archiver('zip');
var upload = s3Stream.upload({
	"Bucket": data.outputBucket,
	"Key": data.outputKey
});
archive.pipe(upload);

Create a promise to call context.done at the end of loading the result:
var allDonePromise = new Promise(function(resolveAllDone) {
	upload.on('uploaded', function (details) {
		resolveAllDone();
	});
});
allDonePromise.then(function() {
	context.done(null, ''); 
});

We get the files at the given addresses and add them to the archive. At the end of downloading all the files, close the archive:
var getObjectPromises = [];
for(var i in data.keys) {
	(function(itemKey) {
		itemKey = decodeURIComponent(itemKey).replace(/\+/g,' ');
		var getPromise = new Promise(function(resolveGet) {
			s3.getObject({
				Bucket: data.bucket,
				Key : itemKey
			}, function(err, fileData) {
				if (err) {
					console.log(itemKey, err, err.stack); 
					resolveGet();
				}
				else {
					var itemName = itemKey.substr(itemKey.lastIndexOf('/'));
					archive
						.append(fileData.Body, { name: itemName });
					resolveGet();
				}
			});
		});
		getObjectPromises.push(getPromise);
	})(data.keys[i]);
}
Promise.all(getObjectPromises).then(function() {
	archive.finalize();
});

All code assembly

var AWS      = require('aws-sdk');
var Promise = require('promise');
var s3Stream = require('s3-upload-stream')(new AWS.S3());
var archiver = require('archiver');
var s3 = new AWS.S3();
exports.handler = function (data, context) {
	var archive = archiver('zip');
	var upload = s3Stream.upload({
	  "Bucket": data.outputBucket,
	  "Key": data.outputKey
	});
	archive.pipe(upload);
	var allDonePromise = new Promise(function(resolveAllDone) {
		upload.on('uploaded', function (details) {
			resolveAllDone();
		});
	});
	allDonePromise.then(function() {
		context.done(null, ''); 		
	});
	var getObjectPromises = [];
	for(var i in data.keys) {
		(function(itemKey) {
			itemKey = decodeURIComponent(itemKey).replace(/\+/g,' ');
			var getPromise = new Promise(function(resolveGet) {
				s3.getObject({
					Bucket: data.bucket,
					Key : itemKey
				}, function(err, data) {
					if (err) {
						console.log(itemKey, err, err.stack); 
						resolveGet();
					}
					else {
						var itemName = itemKey.substr(itemKey.lastIndexOf('/'));
						archive
							.append(data.Body, { name: itemName });
						resolveGet();
					}
				});
			});
			getObjectPromises.push(getPromise);
		})(data.keys[i]);
	}
	Promise.all(getObjectPromises).then(function() {
		archive.finalize();
	});
};


To pack and upload files to AWS, create Gruntfile.js with the following contents:
Gruntfile.js
module.exports = function(grunt) {
	grunt.initConfig({
		lambda_invoke: {
			default: {				
			}
		},
		lambda_package: {
			default: {
			}
		},
		lambda_deploy: {
			default: {
				function: 'zip-s3'
			}
		}
	});
	grunt.loadNpmTasks('grunt-aws-lambda');
};

And the ~ / .aws / creadentials file with AWS access keys:
[default]
aws_access_key_id = ...
aws_secret_access_key = ...

We pack and unload our function in AWS Lambda:
grunt lambda_package lambda_deploy

Calling the created function from our application


We will call the function from the java application.
To do this, prepare the data:
JSONObject requestData = new JSONObject();
requestData.put("bucket", "from-bucket");
requestData.put("outputBucket","to-bucket");
requestData.put("outputKey", "result.zip");
JSONArray keys = new JSONArray();
keys.put(URLEncoder.encode("/123/файл1.txt","UTF-8"));
keys.put(URLEncoder.encode("/456/файл2.txt","UTF-8"));
requestData.put("keys", keys);

And directly call the function:
AWSCredentials myCredentials = new BasicAWSCredentials(accessKeyID, secretKey);
AWSLambdaClient awsLambda = new AWSLambdaClient(myCredentials);
InvokeAsyncRequest req = new InvokeAsyncRequest();
req.setFunctionName("zip-s3");
req.setInvokeArgs(requestData.toString());
InvokeAsyncResult res = awsLambda.invokeAsync(req);

The function will be executed asynchronously - we will immediately get the result that the request for execution was successfully accepted by AWS Lambda, and the execution itself will take some time.

Also popular now: