Best practices for mixed usage of RDBMS and files on filesystem

妖精的绣舞 提交于 2019-12-08 04:48:16

问题


In one of the tables in the schema I am working on, I need to deal with couple-of thousand "data-sheets" which are mostly PDF documents, and sometimes graphic-image files like PNG, JPG etc. The schema models a Electronics Distributor's portal, where new products get added to their portfolio frequently.

These documents (data-sheets) are added, at the time of introduction of a new product, but they need updates from time to time (s.a. due to newer version of the document, not the product itself), so I'd think the update to be an asynchronous procedure.

Given this, should I keep only the file-name/path of the data-sheets (& similar documents) in my table, with the actual file being on filesystem, or should I take the blob approach. I am almost certain that it should be the former approach, but still wanted to take community advise, and see if there are some pitfalls to watchout for.


回答1:


For completeness, let me just mention that some databases allow you to have a "hybrid" of these two approaches, for example Oracle BFILE or MS SQL Server FILESTREAM.

There is also an interesting discussion at Ask Tom on storing files in Oracle BLOBs (in a nutshell: "BLOBs are better than files").


BTW, you don't necessarily need to chose one over another... If you can afford storage overhead and you are operating in a read-mostly environment, you could store the "master" data in the BLOB for integrity but "cache" that same data in a file for quick read-only access. Some considerations:

  • You'd need to make sure the file is updated/removed if BLOB is updated/removed.
  • Consider creating/updating the file on-demand.
  • Consider evicting old files from the "cache" even if corresponding BLOBs still exist.
  • Consider using several "caches" (e.g. if you have a middle tier and is distributed to multiple physical machines, each machine could have its own file cache).
  • And finally, you'd need to make sure all this works robustly in a concurrent environment.

So, this is not the simplest approach but, depending on your needs, may be a good tradeoff between integrity, performance and implementation effort.



来源:https://stackoverflow.com/questions/7963656/best-practices-for-mixed-usage-of-rdbms-and-files-on-filesystem

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!