Is CKAN capable of dealing with 100k+ files and TB of data?

前端 未结 2 799
终归单人心
终归单人心 2021-02-09 11:21

What we are wanting to do is create a local data repository for our lab memebers to organize, search upon, access, catalog, reference our data, etc. I feel that CKAN can do all

2条回答
  •  情话喂你
    2021-02-09 12:00

    Yes :)

    But there are extensions to use or build.

    Take a look at the extensions built for CKAN Galleries (http://datashades.com/ckan-galleries/). We built that specifically for image and video assets that are referenced in the record level of a dataset resource.

    There is an S3 cloud connector for object storage if needed.

    We've started to look at various ways to extend CKAN so it can provide enterprise data storage and management for all types of data. Very large, real time, IoT specific, Linked Data, etc.

    I think in some cases these will be addressed by adding the concept of 'resource containers' to CKAN. In some sense both file store and data store are examples of such resource container extensions.

    Using AWS's API Gateway service we are looking at ways to present the request methods for data stored via external integration with third party solutions as if they were no different to other CKAN resources.

    Although not everyone is there just yet, when you use infrastructure as software, which AWS enables, you can build some really neat stuff which looks like software running on a traditional web stack but is actually making use of S3, Lambda, temporary relational DBs and API Gateway to do some very heavy lifting.

    We aim to open source the approach taken for such work as open architecture as it matures. We've started this already by publishing scripts used to build supercomputer clusters on AWS. You can find those here: https://github.com/DataShades/awscloud-hpc

提交回复
热议问题