I have a dataframe that looks like the following, but with more rows. for each document in the fist column there are some similar labels in the second column and some string