I just completed a project creating 3,500,000+ health insurance claim lines. Due to HIPPA and PHI restrictions, using even scrubbed real data is a PITA. I used a tool called Datatect for this (http://www.datatect.com/).
Some of the things I like about this tool:
- Uses ODBC so you can generate data into any ODBC data source. I've used this for Oracle, SQL and MS Access databases, flat files, and Excel spreadsheets.
- Extensible via VBScript. You can write hooks at various parts of the data generation workflow to extend the abilities of the tool. I used this feature to "sync up" dependent columns in the database, and to control the frequency distribution of values to align with real world observed frequencies.
- Referentially aware. When populating foreign key columns, pulls valid keys from parent table.