问题
I am creating multiple indexes (ie, that use different keys) into a large collection of objects. The objects can change, and the collection can shrink and grow. My thoughts so far:
Keep multiple collections of some kind of pointers to the objects. Use set instead of map for better encapsulation. Use unordered_set to scale well with large data sets. The pointers should ideally all be some form of smart pointer.
I can start off fairly easily with a master collection of unique_ptrs, which manage all allocations, and secondary indexes that use "raw" pointers (I'll leave out the supporting functions for the moment, but note that the index is a multiset as its key won't be unique across the set):
typedef boost::unordered_set< boost::unique_ptr<MyObject>,myobject_hash,myobjects_equal > MyObjects;
typedef boost::unordered_multiset<const MyObject*,myobject_index2_hash,myobject_index2_equal > MyObjectsIndex2;
Usage is straightforward:
MyObjects my_objects;
MyObjectsIndex2 my_objects_index2;
auto it_mo = my_objects.insert(
boost::unique_ptr<MyObject>(
new MyObject(...)
)
);
const MyObject* p_mo = it_mo.first->get();
my_objects_index2.insert(p_mo);
I am considering putting in extra effort to replace the index's use of raw pointers with const references to the unique_ptrs of the master collection. I'm not sure I can though, at least not easily. I thought I would ask if anyone else had gone that route already, or had alternative suggestions.
UPDATE
Lessons learned so far:
- Datastore classes are cool
- reference_wrappers are cool
- A xx_set with a "key" object datastore member var is more space-efficient than xx_map. BUT... you can't easily use a unique_ptr as a key in c++11. c++14 apparently may have better set functionality via
std::set<Key>::find
. See here for more details. So for now, a datastore that manages raw allocations seems to make more sense here than trying to force use of a unique_ptr as a set key, or increasing keyspace storage with maps. - Remember to force key values to be const for the life of the object (use const values provided in constructor)
回答1:
Here's one way.
std::vector<unique_ptr>
to hold the data items (to ensure that the addresses don't change when the vector resizes) and then containers holding reference_wrappers (copyable references) to make the indexes.
compilable example:
#include <map>
#include <vector>
#include <set>
#include <string>
#include <functional>
#include <memory>
#include <iostream>
struct Thing {
Thing(std::string name, int value)
: _name { std::move(name) }
, _value { value }
{}
const std::string& name() const {
return _name;
}
void write(std::ostream& os) const {
os << "{ " << _name << " : " << _value << " }";
}
private:
std::string _name;
int _value;
};
inline std::ostream& operator<<(std::ostream& os, const Thing& t) {
t.write(os);
return os;
}
struct multi_index
{
using multi_by_name_index = std::multimap<std::string, std::reference_wrapper<Thing>>;
void add_thing(std::string name, int value) {
// todo: checks to ensure that indexes won't be violated
// add a new thing to the main store
_main_store.emplace_back(new Thing{std::move(name), value});
// store a reference to it in each index
auto& new_thing = *(_main_store.back().get());
_name_index.emplace(new_thing.name(), new_thing);
}
using multi_by_name_range = std::pair<multi_by_name_index::const_iterator, multi_by_name_index::const_iterator>;
multi_by_name_range get_all_by_name(const std::string name) const
{
return _name_index.equal_range(name);
}
private:
std::vector<std::unique_ptr<Thing>> _main_store;
std::multimap<std::string, std::reference_wrapper<Thing>> _name_index;
};
using namespace std;
int main()
{
multi_index mi;
mi.add_thing("bob", 8);
mi.add_thing("ann", 4);
mi.add_thing("bob", 6);
auto range = mi.get_all_by_name("bob");
for( ; range.first != range.second ; ++range.first) {
cout << range.first->second << endl;
}
return 0;
}
expected output:
{ bob : 8 }
{ bob : 6 }
回答2:
I recognize that your use case is probably different from the one I contrived for my example, and without a lot more details I won't be able to make one that matches (I also think that if you had a lot of details you would be able to find a solution yourself).
#include <iostream>
#include <map>
#include <set>
#include <memory>
#include <stdexcept>
using namespace std;
class Thing
{
public:
Thing() = default;
Thing(const Thing &other) = default;
Thing(int i, string p, string d) : id(i), desc(d), part(p) {}
int id;
string desc;
string part;
};
ostream &operator<<(ostream &out, const Thing &t)
{
if (&t == NULL) out << "(NULL)"; // don't judge me
else out << t.id << ": " << t.part << " (" << t.desc << ")";
}
class Datastore
{
public:
Datastore() = default;
shared_ptr<const Thing> Add(const Thing &t)
{
if (!(index_bydesc.find(t.desc) == index_bydesc.end() &&
index_bypart.find(t.part) == index_bypart.end() &&
index_byid.find(t.id) == index_byid.end()))
throw runtime_error("Non-unique insert");
shared_ptr<const Thing> newt = make_shared<const Thing>(t);
weak_ptr<const Thing> weak = weak_ptr<const Thing>(newt);
index_bydesc[newt->desc] = weak;
index_bypart[newt->part] = weak;
index_byid[newt->id] = weak;
store.insert(newt);
return newt;
}
void Remove(const Thing &t)
{
shared_ptr<const Thing> p = FindBy_Desc(t.desc);
store.erase(p);
index_bydesc.erase(p->desc);
index_bypart.erase(p->part);
index_byid.erase(p->id);
}
shared_ptr<const Thing> FindBy_Desc(string desc)
{
map<string, weak_ptr<const Thing> >::iterator iter = index_bydesc.find(desc);
if (iter == index_bydesc.end()) return shared_ptr<const Thing>();
return iter->second.lock();
}
// index accessors for part and quantity omitted
private:
std::set<shared_ptr<const Thing> > store;
std::map<string, weak_ptr<const Thing> > index_bydesc;
std::map<string, weak_ptr<const Thing> > index_bypart;
std::map<int, weak_ptr<const Thing> > index_byid;
};
int main() {
Datastore d;
d.Add(Thing(1, "TRNS-A", "Automatic transmission"));
d.Add(Thing(2, "SPKPLG", "Spark plugs"));
d.Add(Thing(3, "HOSE-S", "Small hoses"));
d.Add(Thing(4, "HOSE-L", "Large hoses"));
d.Add(Thing(5, "BATT-P", "Primary battery (14.5v nominal)"));
d.Add(Thing(6, "BATT-S", "Secondary batteries (1.5v nominal)"));
d.Add(Thing(7, "CRKSFT", "Crank shaft"));
d.Add(Thing(8, "REAC-F", "Fusion reactor power source"));
cout << *d.FindBy_Desc("Crank shaft") << endl;
d.Remove(*d.FindBy_Desc("Crank shaft"));
cout << *d.FindBy_Desc("Crank shaft") << endl;
return 0;
}
Shortcomings:
- Storage structure is read-only. This is a necessary shortcoming, because the index will become outdated if you modify the indexed fields of an object while it's in the datastore. To modify an object, remove it and then re-add another one.
- All fields must be unique. This is easily changeable, but you need to keep maps containing
list<Thing>
as indices for non-unique fields, not just maps containingThing
. - Performance problems relating to using
std::map
.std::unordered_map
is an alternative with better (constant amortized) access times for huge data structures (the same asstd::unordered_set
).
Deviations:
- Given that you have a clear key-value relationship here, I think you'd be better off with a map than a set.
- In order to solve performance problems relating to reference counting, if you are careful to maintain the internal consistency at all times, you could forgo all smart pointers for raw ones, and return values via references, and you could achieve further improvements by using unsafe object ownership semantics when filling it (i.e., pass it pointers to heap objects that the Datastore then takes ownership of). More complex, but ultimately fewer copies and less runtime overhead.
来源:https://stackoverflow.com/questions/30132951/create-multiple-indexes-into-a-large-collection-of-objects-with-smart-pointers