I have a dataset that has over 10 million variables. It has 25 columns for diagnosis code. Each row represents one patient. Based on admission to the hospital Diagnosis code