Creating star schema from csv files using Python

梦想的初衷 提交于 2019-12-11 17:40:33

问题


I have 6 dimension tables, all in the form of csv files. I have to form a star schema using Python. I'm not sure how to create the fact table using Python. The fact table (theoretically) has at least one column that is common with a dimension table.

How can I create the fact table, keeping in mind that quantities from multiple dimension tables should correspond correctly in the fact table?

I am not allowed to reveal the code or exact data, but I'll add a small example. File 1 contains the following columns: student_id, student_name. File 2 contains : student_id, department_id, department_name, sem_id. Lastly File 3 contains student_id, subject_code, subject_score. The 3 dimension tables are in the form of csv files. I now need the fact table to contain: student_id, student_name, department_id, subject_code. How can I form the fact table in that form? Thank you for your help.


回答1:


Reading certain blogs look like it is not a good way to handle such cases in python in memory but still if the below post make sense you cn use it

Fact Loading

The first step in DW loading is dimensional conformance. With a little cleverness the above processing can all be done in parallel, hogging a lot of CPU time. To do this in parallel, each conformance algorithm forms part of a large OS-level pipeline. The source file must be reformatted to leave empty columns for each dimension's FK reference. Each conformance process reads in the source file and writes out the same format file with one dimension FK filled in. If all of these conformance algorithms form a simple OS pipe, they all run in parallel. It looks something like this.

src2cvs source | conform1 | conform2 | conform3 | load At the end, you use the RDBMS's bulk loader (or write your own in Python, it's easy) to pick the actual fact values and the dimension FK's out of the source records that are fully populated with all dimension FK's and load these into the fact table.




回答2:


Would you like to add any code you're currently stuck on? Please add a Minimal, Complete, and Verifiable example including the file content and expected output



来源:https://stackoverflow.com/questions/51151263/creating-star-schema-from-csv-files-using-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!