aws-lambda

Are parquet file created with pyarrow vs pyspark compatible?

…衆ロ難τιáo~ 提交于 2020-02-25 06:03:40
问题 I have to convert analytics data in JSON to parquet in two steps. For the large amounts of existing data I am writing a PySpark job and doing df.repartition(*partitionby).write.partitionBy(partitionby). mode("append").parquet(output,compression=codec) however for incremental data I plan to use AWS Lambda. Probably, PySpark would be an overkill for it, and hence I plan to use PyArrow for it (I am aware that it unnecessarily involves Pandas, but I couldn't find a better alternative). So,

Are parquet file created with pyarrow vs pyspark compatible?

点点圈 提交于 2020-02-25 06:03:39
问题 I have to convert analytics data in JSON to parquet in two steps. For the large amounts of existing data I am writing a PySpark job and doing df.repartition(*partitionby).write.partitionBy(partitionby). mode("append").parquet(output,compression=codec) however for incremental data I plan to use AWS Lambda. Probably, PySpark would be an overkill for it, and hence I plan to use PyArrow for it (I am aware that it unnecessarily involves Pandas, but I couldn't find a better alternative). So,

AWS Lambda and S3 - uploaded pdf file is blank/corrupt

守給你的承諾、 提交于 2020-02-25 02:08:47
问题 I have an Spring App(running on AWS Lambda) which gets a file and uploads it on AWS S3. The Spring Controller sends a MultipartFile to my method, where it's uploaded to AWS S3, using Amazon API Gateway. public static void uploadFile(MultipartFile mpFile, String fileName) throws IOException{ String dirPath = System.getProperty("java.io.tmpdir", "/tmp"); File file = new File(dirPath + "/" + fileName); OutputStream ops = new FileOutputStream(file); ops.write(mpFile.getBytes()); s3client

S3 Implementation for org.apache.parquet.io.InputFile?

自作多情 提交于 2020-02-24 12:07:55
问题 I am trying to write a Scala-based AWS Lambda to read Snappy compressed Parquet files based in S3. The process will write them backout in partitioned JSON files. I have been trying to use the org.apache.parquet.hadoop.ParquetFileReader class to read the files... the non-deprecated way to do this appears to pass it a implementation of the org.apache.parquet.io.InputFile interface. There is one for Hadoop (HadoopInputFile)... but I cannot find one for S3. I also tried some of the deprecated

S3 Implementation for org.apache.parquet.io.InputFile?

时光毁灭记忆、已成空白 提交于 2020-02-24 12:05:10
问题 I am trying to write a Scala-based AWS Lambda to read Snappy compressed Parquet files based in S3. The process will write them backout in partitioned JSON files. I have been trying to use the org.apache.parquet.hadoop.ParquetFileReader class to read the files... the non-deprecated way to do this appears to pass it a implementation of the org.apache.parquet.io.InputFile interface. There is one for Hadoop (HadoopInputFile)... but I cannot find one for S3. I also tried some of the deprecated

Accessing Oracle from AWS Lambda in Python

為{幸葍}努か 提交于 2020-02-23 07:51:10
问题 I am writing (hopefully) a simply AWS Lambda that will do an RDS Oracle SQL SELECT and email the results. So far I have been using the Lambda Management Console, but all the examples I've run across talk about making a Lambda Deployment Package. So my first question is can I do this from the Lambda Management Console? Next question I have is what to import for the Oracle DB API? In all the examples I have seen, they download and build a package with pip, but that would then seem to imply

AWS Lambda Layer

孤街浪徒 提交于 2020-02-23 07:06:35
问题 I am trying to import pandas library to my aws lambda layer. But is gives an error saying to cannot import lambda.function: no module names numpy. Can some explain what is the problem with pandas and aws. when I try to locally run it on pycharm using SAM it throws the same error. 回答1: Lambdas have a limit of 125 MB as a part of the zip file or code that you upload and typically Pandas/Numpy are huge libraries that go past those limits potentially. Hence 1) If the part of your code that is

AWS Lambda Layer

点点圈 提交于 2020-02-23 07:02:31
问题 I am trying to import pandas library to my aws lambda layer. But is gives an error saying to cannot import lambda.function: no module names numpy. Can some explain what is the problem with pandas and aws. when I try to locally run it on pycharm using SAM it throws the same error. 回答1: Lambdas have a limit of 125 MB as a part of the zip file or code that you upload and typically Pandas/Numpy are huge libraries that go past those limits potentially. Hence 1) If the part of your code that is

SES email not sending

a 夏天 提交于 2020-02-23 06:06:32
问题 I am using AWS SES service to send email with verified test email address in SES and used same for the Source. I am trying to send email to other email address but not able to send it's giving me error "Email address is not verified. The following identities failed the check in region US-EAST-1". Reference for code to send email: https://aws.amazon.com/premiumsupport/knowledge-center/lambda-send-email-ses/ I have read in aws documentation (https://docs.aws.amazon.com/ses/latest/DeveloperGuide

SES email not sending

人盡茶涼 提交于 2020-02-23 06:00:23
问题 I am using AWS SES service to send email with verified test email address in SES and used same for the Source. I am trying to send email to other email address but not able to send it's giving me error "Email address is not verified. The following identities failed the check in region US-EAST-1". Reference for code to send email: https://aws.amazon.com/premiumsupport/knowledge-center/lambda-send-email-ses/ I have read in aws documentation (https://docs.aws.amazon.com/ses/latest/DeveloperGuide