pairwise

Finding index of pairwise elements

孤人 提交于 2019-12-03 14:25:21
Given the target ('b', 'a') and the inputs: x0 = ('b', 'a', 'z', 'z') x1 = ('b', 'a', 'z', 'z') x2 = ('z', 'z', 'a', 'a') x3 = ('z', 'b', 'a', 'a') The aim is to find the location of the continuous ('b', 'a') element and get the output: >>> find_ba(x0) 0 >>> find_ba(x1) 0 >>> find_ba(x2) None >>> find_ba(x3) 1 Using the pairwise recipe: from itertools import tee def pairwise(iterable): "s -> (s0,s1), (s1,s2), (s2, s3), ..." a, b = tee(iterable) next(b, None) return zip(a, b) I could do this to get the desired output: def find_ba(x, target=('b', 'a')): try: return next(i for i, pair in

Spark Python: How to calculate Jaccard Similarity between each line within an RDD?

泄露秘密 提交于 2019-12-01 23:21:35
I have a table of around 50k distinct rows, and 2 columns. You can think of each row being a movie, and columns being the attributes of that movie - "ID": id of that movie, "Tags":some content tags of the movie, in form of a list of strings for each movie . Data looks something like this: movie_1, ['romantic','comedy','English']; movie_2, ['action','kongfu','Chinese'] My goal is to first calculate the jacquard similarity between each Movie based on their corresponding tags, and once that's done, I will be able to know for each movie (for example I choose movie_1), what are the other top 5 most