I have the following class
class Singleton
{
private:
static Singleton *p_inst;
Singleton();
public:
static Singleton * instance()
You can eliminate all issues by simply allocating (any way you choose) such objects before multiple threads are started. This may not always be possible due to design constraints (using the singletons in statics, you NEED lazy allocation, etc.), but it is simple and gives you control of the creation sequence. Sometimes tracking down issues with regard to order and time of allocation of such objects is a hassle that you can easily avoid.
P.S. - I know that this doesn't directly answer your question, but it may be a practical solution to a real problem without complexity.