How to find all variables with identical id?

巧了我就是萌 提交于 2019-12-24 17:12:18

问题


Let's say I have a numpy array a and create b like this:

a = np.arange(3)
b = a

If I now change b e.g. like this

b[0] = 100

and print a, b, their ids and .flags

print a
print a.flags    
print b
print b.flags
print id(a)
print id(b)

I obtain

[100   1   2]

  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

[100   1   2]

  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

139767698376944
139767698376944

So, a and b look the same and their ids are identical as expected.

When I now do the same using copy()

c = np.arange(3)
d = c.copy()

d[0] = 20

print c
print c.flags
print id(c)

print d
print d.flags
print id(d)

I get

[0 1 2]

  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

139767698377344

[20  1  2]

  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

139767698376864

In this case c and d differ and so do their ids; also as expected.

However, what confuses me is the output I obtain from .flags: In all cases, OWNDATA is set to True. When I read the documentation, I find:

OWNDATA (O) The array owns the memory it uses or borrows it from another object.

My main question is now:

What would be the easiest way to find all variables that point to the same id (in the example above a and b) i.e. to check whether another variable with the same id exists? I thought OWNDATA would be of help for that but apparently it is not.

Related question:

What is OWNDATA actually used for, in which case is OWNDATA set to False?


回答1:


There are 2 issues - how do you identify the variables that you want to compare, and how to do you compare them.

Take the second first.

My version (1.8.2) does not have a np.shares_memory function. It does have a np.may_share_memory.

https://github.com/numpy/numpy/pull/6166 is the pull request that adds shares_memory; it' dated last August. So you'd have to have brand new numpy to use it. Note that a definitive test is potentially hard, and it may issue as 'TOO HARD' error message. I imagine, for example that there are some slices that share the memory, but hard to identify by simply comparing buffer starting points.

https://github.com/numpy/numpy/blob/97c35365beda55c6dead8c50df785eb857f843f0/numpy/core/tests/test_mem_overlap.py is the unit test for these memory_overlap functions. Read it if you want to see what a daunting task it is to think of all the possible overlap conditions between 2 known arrays.

I like to look at the array's .__array_interface__. One item in that dictionary is 'data', which is a pointer to the data buffer. Identical pointer means the data is shared. But a view might start somewhere down the line. I wouldn't be surprised if shares_memeory looks at this pointer.

Identical id means 2 variables reference the same object, but different array objects can share a data buffer.

All these tests require looking specific references; so you still need to get some sort of list of references. Look at locals()?, globals(). What about unnamed references, such as list of arrays, or some user defined dictionary?

An example Ipython run:

Some variables and references:

In [1]: a=np.arange(10)
In [2]: b=a           # reference
In [3]: c=a[:]        # view
In [4]: d=a.copy()    # copy
In [5]: e=a[2:]       # another view
In [6]: ll=[a, a[:], a[3:], a[[1,2,3]]]  # list 

Compare id:

In [7]: id(a)
Out[7]: 142453472
In [9]: id(b)
Out[9]: 142453472

None of the others share the id, except ll[0].

In [10]: np.may_share_memory(a,b)
Out[10]: True
In [11]: np.may_share_memory(a,c)
Out[11]: True
In [12]: np.may_share_memory(a,d)
Out[12]: False
In [13]: np.may_share_memory(a,e)
Out[13]: True
In [14]: np.may_share_memory(a,ll[3])
Out[14]: False

That's about what I'd expect; views share memory, copies do not.

In [15]: a.__array_interface__
Out[15]: 
{'version': 3,
 'data': (143173312, False),
 'typestr': '<i4',
 'descr': [('', '<i4')],
 'shape': (10,),
 'strides': None}
In [16]: a.__array_interface__['data']
Out[16]: (143173312, False)
In [17]: b.__array_interface__['data']
Out[17]: (143173312, False)
In [18]: c.__array_interface__['data']
Out[18]: (143173312, False)
In [19]: d.__array_interface__['data']
Out[19]: (151258096, False)            # copy - diff buffer
In [20]: e.__array_interface__['data'] 
Out[20]: (143173320, False)            # differs by 8 bytes
In [21]: ll[1].__array_interface__['data']
Out[21]: (143173312, False)            # same point

Just with this short session I hav 76 items in locals(). But I can search it for matching id with:

In [26]: [(k,v) for k,v in locals().items() if id(v)==id(a)]
Out[26]: 
[('a', array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])),
 ('b', array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))]

Same for the other tests.

I can search ll in the same way:

In [28]: [n for n,l in enumerate(ll) if id(l)==id(a)]
Out[28]: [0]

And I could add a layer to the locals() search by testing if an item is a list or dictionary, and doing a search within that.

So even if we settle on the testing method, it isn't trivial to search for all possible references.

I think the best approach is to just understand your own use of variables, so that you can clearly identify references, views and copies. In selected cases you can perform tests like may_share_memory or comparing databuffers. But there isn't an inexpensive, definitive test. When in doubt it is cheaper to make a copy, than to risk over writing something. In my years of numpy use I've never felt the need to an definitive answer to this question.


I don't find the OWNDATA flag very useful. Consider the above variables

In [35]: a.flags['OWNDATA']
Out[35]: True
In [36]: b.flags['OWNDATA']   # ref
Out[36]: True
In [37]: c.flags['OWNDATA']   # view
Out[37]: False
In [38]: d.flags['OWNDATA']   # copy
Out[38]: True
In [39]: e.flags['OWNDATA']   # view
Out[39]: False

While I can predict the OWNDATA value in these simple cases, its value doesn't say much about shared memory, or shared id. False suggests it was created from another array, and thus may share memory. But that's just a 'may'.

I often create a sample array by reshaping a range.

In [40]: np.arange(3).flags['OWNDATA']
Out[40]: True
In [41]: np.arange(4).reshape(2,2).flags['OWNDATA']
Out[41]: False

There's clearly no other reference to the data, but the reshaped array does not 'own' its own data. Same would happen with

temp = np.arange(4); temp = temp.reshape(2,2)

I'd have to do

temp = np.arange(4); temp.shape = (2,2)

to keep OWNDATA true. False OWNDATA means something right after creating the new array object, but it doesn't change if the original reference is redefined or deleted. It easily becomes out of date.




回答2:


The assignment b=a does not create a view on the original array a but simply creates a reference to it. In other words, b is just a different name for a. Both variables a and b refer to the same array which owns its data such that the OWNDATA flag is set. Modifying b will modify a.

The assignment b=a.copy() creates a copy of the original array. That is, a and b refer to separate arrays which both own their data such that the OWNDATA flag is set. Modifying b will not modify a.

However, if you make the assignment b=a[:], you will create a view of the original array and b will not own its data. Modifying b will modify a.

The shares_memory function is what you are looking for. It does what it says on the box: Check whether to arrays a and b have shared memory and thus affect each other.



来源:https://stackoverflow.com/questions/33467477/how-to-find-all-variables-with-identical-id

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!