Loading data (incrementally) into Amazon Redshift, S3 vs DynamoDB vs Insert

前端 未结 5 1632
轻奢々
轻奢々 2020-12-23 11:58

I have a web app that needs to send reports on its usage, I want to use Amazon RedShift as a data warehouse for that purpose, How should i collect the data ?

Every

5条回答
  •  天命终不由人
    2020-12-23 12:49

    Just being a little selfish here and describing exactly what Snowplow ,an event analytics platform does. They use this awesome unique way of collecting event logs from the client and aggregating it on S3.

    They use Cloudfront for this. What you can do is, host a pixel in one of the S3 buckets and put that bucket behind a CloudFront distribution as an origin. Enable logs to an S3 bucket for the same CloudFront.

    You can send logs as url parameters whenever you call that pixel on your client (similar to google analytics). These logs can then be enriched and added to Redshift database using Copy.

    This solves the purpose of aggregation of logs. This setup will handle all of that for you.

    You can also look into Piwik which is an open source analytics service and see if you can modify it specific to your needs.

提交回复
热议问题