Create a Test Database copy of Production with only a sample set of Data

霸气de小男生 提交于 2019-12-22 08:22:17

问题


We have 2 Oracle databases. One is our Production customer database. The other is a test database. The Test database is a copy of production without any data...just empty tables. I am trying to come up with an efficient and effective way to copy a sample set of data from the Production db to the Test.

I need to start with a sample list of PKs from our Customer table and write a process to populate the test database Customer table with that sample. Then, I need to populate child tables with relevant data based on the sample.

We used to have a Java process that would use a prepared statement to select data from Production...build a file based on the result set...then use an update statement to get it into the Test db. It was horribly inefficient.

I am not a DBA so I don't know much of that realm. I tried to Google some info about Database Links but couldn't find very good examples.

Does anyone know of a way to do this...or, at least give me a starting point?


回答1:


If you have enough space, I would suggest copying the entire production data to test. It will be a lot easier to administer, it could also be a good oportunity to test your backup (restore from backup to a new instance).

From a developer point of view, you won't be able to test the performance of your application reliably without a representative set of data. This data set should have the same properties as the production data (data volume, physical distribution...). The easiest way to achieve this is to have the same data in test as in production.

If you can afford downtime you could stop the production db, copy the file to the test server and mount both databases. If you can't afford downtime it might be a good idea to pickup some DBA skills (and eventually learn about hot backup then restore to a new instance).


Update: if physically copying your database is not feasible, you should look into bulk copying data with expdp and impdp (or the old exp/imp). You can either copy all schemas or filter the data on export. You would choose appropriate WHERE clause by hand in this case. Exporting and importing in bulk will be orders of magnitude faster than copying data row by row.




回答2:


i just want to point out the industry scenario and products .. not a solution but an input for reference as @aintnoprophet mentioned that

"I am not a DBA so I don't know much of that realm. I tried to Google some info about Database Links but couldn't find very good examples."

This use case is related to test data management systems. This concept deals with the fact that we need to have a way to manage test data, that goes into those specific test cases for some specific test scenarios.It will be very difficult to manually create the subset of data from large amounts of data, you significantly increase the chance of human error.So repeated data sub-setting for multiple test iterations quickly becomes unmanageable.

there are many use cases, especially in enterprise examples -

http://www.compuware.com/resources/mainframe/Fileaid_clientserver.pdf

http://www.erphome.net/wdb/upload/forum1_f_8007_1227768589.pdf

these are for huge enterprise data.

there is an open source tdm tool http://sourceforge.net/projects/opentdm/ but not in java (i m not sure whether its good or not)

if you are looking for products (jut for reference) some among are

  • Informatica Test Data Management Solution

  • Informatica Data Subset

  • IBM Optim




回答3:


A test databases that are full copies of production will work much better than a subset for flushing out bugs. Most people subset because they don't have the time and resources to make full copies but all that has changed with virtual data aka copy data management. In such systems you can ingest one copy of production and make thin clone copies in minutes for almost no extra storage. This systems also generally compress the the data so the ingest copy is 1/3 size of the original and have the advantage of fast refresh in minutes which is great in destructive testing. They also tend to come integrated with masking since we probably don't want sensitive data coming out of the production environment into dev and test. There are a number of vendors in this area. Check out Netapp SMO, Actifio, Oracle Snap Clone and Delphix for starters. Most of these require some specialized hardware which can be an obstacle. Delphix is fully software and even has a free version called Delphix Express that you can download. Similar technology is being use by Rubrick and Cohesity for VM images.

Best Wishes Kyle Hailey




回答4:


this looks like what you look for is Jailer. It exports consistent, referentially intact row-sets from relational databases.



来源:https://stackoverflow.com/questions/5719881/create-a-test-database-copy-of-production-with-only-a-sample-set-of-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!