aws-lambda | 易学教程

Are parquet file created with pyarrow vs pyspark compatible?

阅读更多关于 Are parquet file created with pyarrow vs pyspark compatible?

问题 I have to convert analytics data in JSON to parquet in two steps. For the large amounts of existing data I am writing a PySpark job and doing df.repartition(*partitionby).write.partitionBy(partitionby). mode("append").parquet(output,compression=codec) however for incremental data I plan to use AWS Lambda. Probably, PySpark would be an overkill for it, and hence I plan to use PyArrow for it (I am aware that it unnecessarily involves Pandas, but I couldn't find a better alternative). So,

Are parquet file created with pyarrow vs pyspark compatible?

阅读更多关于 Are parquet file created with pyarrow vs pyspark compatible?

AWS Lambda and S3 - uploaded pdf file is blank/corrupt

阅读更多关于 AWS Lambda and S3 - uploaded pdf file is blank/corrupt

问题 I have an Spring App(running on AWS Lambda) which gets a file and uploads it on AWS S3. The Spring Controller sends a MultipartFile to my method, where it's uploaded to AWS S3, using Amazon API Gateway. public static void uploadFile(MultipartFile mpFile, String fileName) throws IOException{ String dirPath = System.getProperty("java.io.tmpdir", "/tmp"); File file = new File(dirPath + "/" + fileName); OutputStream ops = new FileOutputStream(file); ops.write(mpFile.getBytes()); s3client

S3 Implementation for org.apache.parquet.io.InputFile?

阅读更多关于 S3 Implementation for org.apache.parquet.io.InputFile?

问题 I am trying to write a Scala-based AWS Lambda to read Snappy compressed Parquet files based in S3. The process will write them backout in partitioned JSON files. I have been trying to use the org.apache.parquet.hadoop.ParquetFileReader class to read the files... the non-deprecated way to do this appears to pass it a implementation of the org.apache.parquet.io.InputFile interface. There is one for Hadoop (HadoopInputFile)... but I cannot find one for S3. I also tried some of the deprecated

S3 Implementation for org.apache.parquet.io.InputFile?

阅读更多关于 S3 Implementation for org.apache.parquet.io.InputFile?

Accessing Oracle from AWS Lambda in Python

阅读更多关于 Accessing Oracle from AWS Lambda in Python

问题 I am writing (hopefully) a simply AWS Lambda that will do an RDS Oracle SQL SELECT and email the results. So far I have been using the Lambda Management Console, but all the examples I've run across talk about making a Lambda Deployment Package. So my first question is can I do this from the Lambda Management Console? Next question I have is what to import for the Oracle DB API? In all the examples I have seen, they download and build a package with pip, but that would then seem to imply

AWS Lambda Layer

阅读更多关于 AWS Lambda Layer

问题 I am trying to import pandas library to my aws lambda layer. But is gives an error saying to cannot import lambda.function: no module names numpy. Can some explain what is the problem with pandas and aws. when I try to locally run it on pycharm using SAM it throws the same error. 回答1: Lambdas have a limit of 125 MB as a part of the zip file or code that you upload and typically Pandas/Numpy are huge libraries that go past those limits potentially. Hence 1) If the part of your code that is

AWS Lambda Layer

阅读更多关于 AWS Lambda Layer

SES email not sending

阅读更多关于 SES email not sending

问题 I am using AWS SES service to send email with verified test email address in SES and used same for the Source. I am trying to send email to other email address but not able to send it's giving me error "Email address is not verified. The following identities failed the check in region US-EAST-1". Reference for code to send email: https://aws.amazon.com/premiumsupport/knowledge-center/lambda-send-email-ses/ I have read in aws documentation (https://docs.aws.amazon.com/ses/latest/DeveloperGuide

SES email not sending

阅读更多关于 SES email not sending