This is a followup questions to this one:
Python DictReader - Skipping rows with missing columns?
Turns out I was being silly, and using the wrong ID field.<
You probably will need to do some iteration to get the data. I assume you don't want an extra dict that can get out of date, so it won't be worth it trying to store everything keyed on internal ids.
Try this on for size:
def lookup_supervisor(manager_internal_id, employees):
if manager_internal_id is not None and manager_internal_id != "":
manager_dir_ids = [dir_id for dir_id in employees if employees[dir_id].get('internal_id') == manager_internal_id]
assert(len(manager_dir_ids) <= 1)
if len(manager_dir_ids) == 1:
return manager_dir_ids[0]
return None
def tidy_data(employees):
for emp_data in employees.values():
manager_dir_id = lookup_supervisor(emp_data.get('manager_internal_id'), employees)
for (field, sup_key) in [('Email', 'mail'), ('FirstName', 'givenName'), ('Surname', 'sn')]:
emp_data['Supervisor'+field] = (employees[manager_dir_id][sup_key] if manager_dir_id is not None else 'Supervisor Not Found')
And you're definitely right that a class is the answer for passing employees
around. In fact, I'd recommend against storing the 'Supervisor' keys in the employee dict, and suggest instead getting the supervisor dict fresh whenever you need it, perhaps with a get_supervisor_data
method.
Your new OO version all looks reasonable except for the changes I already mentioned and some tweaks to clean_phone_number
.
def clean_phone_number(self, original_telephone_number):
phone_re = re.compile(r'^\+(?P\d{2})\((?P0?)(?P\d)\)(?P\d{4})(?P-?)(?P\d{4})')
result = phone_re.search(original_telephone_number)
if result is None:
return '', "Number didn't match format. Original text is: " + original_telephone_number
msg = ''
if result.group('extra_zero'):
msg += 'Extra zero in area code - ask user to remediate. '
if result.group('hyph'): # Note: can have both errors at once
msg += 'Missing hyphen in local component - ask user to remediate. '
return '0' + result.group('area_code') + result.group('local_first_half') + result.group('local_second_half'), msg
You could definitely make an individual object for each employee, but seeing how you're using the data and what you need from it, I'm guessing it wouldn't have that much payoff.