How to store a sparse queryable matrix on disk or database?

£可爱£侵袭症+ 提交于 2020-01-03 08:44:12

问题


I need to store a sparse matrix on disk. It is like a database table with millions of rows and thousands of columns, where many or most columns are null. It needs to be queryable, like a SQL SELECT with a WHERE on some of the columns.

My specific requirement is on Java. I first thought of using Berkeley DB for Java to simulate a table, but then it does not support querying based on values.

Then, I thought about using a regular SQL database. For example, creating a schema with only a Row ID, a Column ID, and the value. The virtual row will be all the actual rows with the same ID. But then, this looks like database abuse.

Any ideas?


回答1:


The first thing that came to my mind when reading the question heading was a database row per (x,y) as you suggested in your next to last paragraph.

The other thing to note is that databases often compress the rows, particularly for NULLs, so the straightforward representation may not waste as much space as you think.




回答2:


It depends on your definition of "many or most columns are null", but that sounds like a very reasonable approach, assuming that you really need the random access.

If you can do everything by sequential processing (e.g. a scan in row order) then a flat file would be another reasonable option to consider.




回答3:


the Intersystems Cache database uses structures internally for storing data, which are sparse multidimensional arrays. Maybe check that out. You can query it, and map it to SQL tables. I'm not sure if you can directly access the multidimensional arrays in Intersystems Cache from java though.




回答4:


If you only need to save the data as you say "on disk", read it back & make certain queries, you might want to consider simply serializing the class of your own design and choice. Eliminates all need for database infrastructure, but obviously also excludes many of the things that a database would actually help you with,




回答5:


Recently, I've become interested in key-value type storage, and came across this blog entry describing how FriendFeed uses MySQL to implement a similar requirement:

http://bret.appspot.com/entry/how-friendfeed-uses-mysql



来源:https://stackoverflow.com/questions/598089/how-to-store-a-sparse-queryable-matrix-on-disk-or-database

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!