pandas:calculate jaccard similarity for every row based on the value in another column
问题 I have a dataframe as follows, only with more rows: import pandas as pd data = {'First': ['First value', 'Second value','Third value'], 'Second': [['old','new','gold','door'], ['old','view','bold','door'],['new','view','world','window']]} df = pd.DataFrame (data, columns = ['First','Second']) To calculate the jaccard similarity i found this piece online(not my solution): def lexical_overlap(doc1, doc2): words_doc1 = set(doc1) words_doc2 = set(doc2) intersection = words_doc1.intersection(words