DataFrame
df = pd.DataFrame({\'A\': [[\'gener\'], [\'gener\'], [\'system\'], [\'system\'], [\'gutter\'], [\'gutter\'], [\'gutter\'], [\'gutter\'
Just use the apply
function supported by pandas
, it's great.
Since you may have more than two columns for intersecting, the auxiliary function can be prepared like this and then applied with the DataFrame.apply
function (see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html, note the option axis=1
means "across the series" while axis=0
means "along the series", where one
series is just one column in the data frame). Each row across the columns is then passed as a iterable Series
object to the function applied.
def intersect(ss):
ss = iter(ss)
s = set(next(ss))
for t in ss:
s.intersection_update(t) # `t' must not be a `set' here, `list' or any `Iterable` is OK
return s
res = df.apply(intersect, axis=1)
>>> res
0 {}
1 {}
2 {system}
3 {system}
4 {gutter}
5 {gutter}
6 {gutter}
7 {gutter}
8 {gutter}
9 {gutter}
10 {aluminum}
11 {aluminum}
12 {aluminum}
13 {aluminum}
14 {aluminum}
15 {aluminum}
16 {aluminum}
17 {aluminum}
18 {aluminum}
19 {aluminum, toledo}
You can augment further operations on the result of the auxiliary function, or make some variations similarly.
Hope this helps.