I use std::tr1::shared_ptr extensively throughout my application. This includes passing objects in as function arguments. Consider the following:
class Datas
If you're not using make_shared, could you give that a go? By locating the reference count and the object in the same area of memory you may see a performance gain associated with cache coherency. Worth a try anyway.