default_storage.exists extremely slow and frequently times out

南楼画角 提交于 2019-12-06 13:46:16

I encountered similar problems on a production site as it scaled up. What I'd recommend is using a storage backend that can maintain its own copy of all meta-information about your S3 files. The best such project is probably MimicDB, though you can also check out what I've done with a modified django-storages. That way, metadata queries like .exists(), .url, etc. are answered instantly from the local cache.

Also, make sure that you are generally just getting the URL or other metadata of the image objects, and not using any code that would cause your server to needlessly fetch the actual image data. What I like to do when setting this sort of thing up is modify the S3 wrapper (e.g. boto) so it will log every raw S3 REST request, and then test the site and make sure that simply viewing web pages on the site doesn't cause any S3 requests from the web server.

Two-Bit Alchemist

Switching to CloudFront completely solved this issue, and was relatively easy (no code changes just more monkeying around with the Amazon Console), so I decided to answer my own question.

tl;dr Do not serve files directly from S3; set up CloudFront.


Serving an S3 Bucket via CloudFront

Step 0: If you haven't already, make sure your bucket name complies with the "best practices" for naming buckets. They don't necessarily make this obvious in all the places they should, but a bad bucket name can completely break its interoperability with other Amazon Web Services. The best thing to do is name your bucket something all lowercase that's not too long (<= 60 characters or so).

Step 1: In order to get CloudFront to serve files from your bucket, you need to set it up as if to serve a static website. You can do this on the Amazon AWS console from your bucket's Permissions tab. Amazon has several places where there are instructions/documentation for this; IMO the clearest are these. IMPORTANT: Make sure you set up the Default Root Object to index.html -- that file doesn't even have to exist, but that setting does.

Step 1.5 [possibly optional]: Make sure the permissions on your bucket are correct. Even though I was serving files from S3 no problem, changing to CloudFront to serve them turned everything into a 403: Access Forbidden error. If in doubt, and your files are not sensitive, you can right click on folders of your bucket in the AWS Console and click Make Public. WARNING: This can be a very time intensive process, and for some stupid reason (even though it's server side) your browser session has to stay open. Do this first and don't close your session. For our bucket, this took about 16 hours. :/

Step 2: Go to the Amazon CloudFront section in the AWS Console and click the Create Distribution button. Make it a web distribution (default) and use the domain you generated by setting your bucket up for static web distribution in the previous step as the origin. Again, IMO, these are the clearest and most straightforward instructions in the AWS docs. You can leave just about everything default here. Once it's created, just wait until it's listed on the console as "Deployed".

Step 3: Configure your app to serve from CloudFront rather than S3. This is the easiest part because the URLs are transparently moved from https://bucketname.s3.amazonaws.com/path to https://somerandomstring.cloudfront.net/path (bonus: you can set up the latter as a CNAME record to point to something like media.yourdomain.tld; we didn't do this so I won't go into it here). Since I'm using Django with a combination of django-storages and s3-boto, this ended up being a simple matter of setting up that Cloudfront domain in settings.py:

AWS_S3_CUSTOM_DOMAIN = 'd2ynhpzeiwwiom.cloudfront.net'

And that's it! With these changes, all of our speed woes went away, and our media-rich pages (6-20 MP worth of images per page) suddenly load faster than ever!

Set AWS_PRELOAD_METADATA = False (default value is False), to avoid loading all the files metadata from the bucket to memory. This will not only boost the waiting time of the request, it will also reduce the memory usage. In my case the memory usage got reduced from ~1.5 GB to 5-6MB and the time reduced from 60s to 4-5s only.

Original answer: Why does default_storate.exists() with django-storages with S3Boto backend cause a memory error with a large S3 bucket?

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!