I have a pandas dataframe df which looks like this:
| source_num| source_date| text | category |location | source | +---------+------------+-------