Finding the most common subarray within a numpy array

一曲冷凌霜 提交于 2021-02-09 08:29:34

问题


Example data:

array(
  [[ 1.,  1.],
   [ 2.,  1.],
   [ 0.,  1.],
   [ 0.,  0.],
   [ 0.,  0.]])

with a desired result of

>>> [0.,0.]

ie) The most common pair.

Approaches that don't seem to work:

Using statistics as numpy arrays are unhashable.

Using scipy.stats.mode as this returns the mode over each axis, eg) for our example it gives

mode=array([[ 0.,  1.]])

回答1:


You can do this efficiently with numpy using the unique function:

pairs, counts = np.unique(a, axis=0, return_counts=True)
print(pairs[counts.argmax()])

Returns: [ 0. 0.]




回答2:


One way via the standard library is to use collections.Counter.

This gives you both the most common pair and the count. Use [0] index on Counter.most_common() to retrieve the highest count.

import numpy as np
from collections import Counter

A = np.array(
  [[ 1.,  1.],
   [ 2.,  1.],
   [ 0.,  1.],
   [ 0.,  0.],
   [ 0.,  0.]])

c = Counter(map(tuple, A)).most_common()[0]

# ((0.0, 0.0), 2)

The only complication is you need to convert to tuple as Counter only accepts hashable objects.



来源:https://stackoverflow.com/questions/49694879/finding-the-most-common-subarray-within-a-numpy-array

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!