I am creating 3 EC2 instances, and subsequently iterating and tagging each of them. Sometimes the tag request fails, although the instance later appears to be running.
AWS has meanwhile added more detailed documentation on Troubleshooting API Request Errors, including a section addressing Eventual Consistency, which basically confirms the analysis in my initial answer below:
The Amazon EC2 API follows an eventual consistency model, due to the distributed nature of the system supporting the API. This means that the result of an API command you run that affects your Amazon EC2 resources might not be immediately visible to all subsequent commands you run. [...]
[...] For example, [...] if you run a command to modify or describe the resource that you just created, its ID might not have propagated throughout the system, and you will get an error responding that the resource does not exist.
To manage eventual consistency, you can do the following:
Confirm the state of the resource before you run a command to modify it. Run the appropriate Describe command using an exponential backoff algorithm to ensure that you allow enough time for the previous command to propagate through the system. [...]
Add wait time between subsequent commands, even if a Describe command returns an accurate response. Apply an exponential backoff algorithm starting with a couple of seconds of wait time, and increase gradually up to about five minutes of wait time.
[emphasis mine]
Please note: Most AWS SDKs meanwhile apply these suggestions automatically, including options to adjust the default retry policy or add a custom implementation even - see Error Retries and Exponential Backoff in AWS for guidance on how to implement it yourself, if need be.
The eventually consistent design of the AWS API is increasingly encountered by various large scale AWS users, who naturally need to look deeper and work around it accordingly, see for example the following articles:
As already commented by @datasage, the AWS APIs apparently need to be generally treated as eventually consistent only - this is certainly unexpected when first encountered, but actually not too surprising for a large scale service in hindsight, i.e. an engineering resp. operational tradeoff to address the CAP theorem.
See also my comment on Alex Ciminian's question Implementing idempotency for AWS Spot Instance Requests, where he discusses his test results regarding similar consistency issues:
Interesting issue - [...] I've encountered various similar API delays in the context of the Bamboo AWS Plugin and concluded that the AWS API needs to the treated as being eventually consistent only across the board; e.g., I've even encountered cases where I received a resource id from a create call, could tag the resource based on its id but not describe it thereafter still, because it supposedly doesn't exist (yet).
For details on the mentioned cases you might want to look into Frequent polling of AWS API causes throttle limit, where I summarize our analysis and approach to improve the handling via the available but limited retry/backoff functionality within the AWS SDK for Java - the solution is all but ideal, but it seems to considerably improve things for the time being.
On a similar note, the redesigned AWS SDK for PHP 2 introduced dedicated “Waiter” objects that allow you to poll a resource until it is in a desired state to address the problem, see section Waiters within the Quick Start for details:
One of the high-level abstractions provided by the SDK is the concept of “waiters”. Waiters help make it easier to work with eventually consistent systems by providing an easy way to wait on a resource to enter into a particular state by polling the resource. [...] Any @method tag that starts with “waitUntil” will utilize a waiter.
$client->waitUntil('BucketExists', array('Bucket' => 'my-bucket'));
Tag name cannot be created until the instance is launched . You might try giving the key_name while creating the instance. If you are using boto, it could be done by
reservation = conn.run_instances(1, 1, instance_type='m1.small', key_name='samplename')
Then instances can be retrieved by passing key_name and once they are in running state, you can also give them the tag name.