Integrating Lucene Index and Amazon AWS

问题

I have a an existing index of lucene index files and the java code to perform search functions on it.

What I would like to do is perform the same thing on a server so users of an app could simply pass a query that will be taken as an input parameter by the java program and run it against the existing index to return the document in which it occurs.

All the implementation has been tested on my local pc,but what I need to do is implement it in an Android app.

So far I have read around and concluded that porting the code in AWS lambda and using S3 to store the files and calling the s3 objects from lambda.

Is this the right approach?Any resources that point to the this approach or alternative suggestions are also appreciated.

Thanks in advance.

回答1:

Every time your Android app sends a request to AWS Lambda (via AWS API Gateway I assume) the Lambda function will have to download the entire index file from S3 to the Lambda /tmp directory (where Lambda has a 512MB limit) and then perform a search against that index file. This seems extremely inefficient, and depending on how large your index file is, it might perform terribly or it might not even fit into the space you have available on Lambda.

I would suggest looking into the AWS Elasticsearch Service. This is a fully managed search engine service, based on Lucene, that you should be able to query directly from your Android application.

回答2:

As you already have your index files in S3, you can direct your Lucene Index reader to point to a Location on S3.

String index = "/<BUCKET_NAME>/<INDEX_LOCATION>/";
String endpoint = "s3://s3.amazonaws.com/";
Path path = new com.upplication.s3fs.S3FileSystemProvider().newFileSystem(URI.create(endpoint), env).getPath(index);
IndexReader reader = DirectoryReader.open(FSDirectory.open(path))

You can either pass in client credentials in env or you can assign role to your Lambda function.

Ref: https://github.com/prathameshjagtap/aws-lambda-s3-index-search/blob/master/lucene-s3-searcher/src/com/printlele/SearchFiles.java

回答3:

For Lucene indices less than 512MB you can experiment with lucene-s3directory.

As Mark said, on AWS Lambda you are limited to 512MB on /tmp. I think having a completely serverless search service is very desirable but until that limit is gone, we're stuck with EC2 for production deployments. Once you go with running Lucene on EC2, storing the index on S3 becomes pointless as you have access to EBS or ephemeral storage.

In case you want to try out S3Directory, here's how to get started:

S3Directory dir = new S3Directory("my-lucene-index");
dir.create();
// use it in your code in place of FSDirectory, for example
dir.close();
dir.delete();

来源：https://stackoverflow.com/questions/38817050/integrating-lucene-index-and-amazon-aws

标签

java

amazon-web-services

amazon-s3

lucene

aws-lambda