问题
I have about 55 EMR clusters (all of them were terminated) and have been trying to retrieve the entire 55 EMR clusters using the list_clusters method in boto. I've been searching for examples about paginating the number of result set from boto but couldn't find any examples. Given this statement:
emr_object.list_clusters(cluster_states=["TERMINATED"], marker="what_should_i_use_here").clusters
I kept getting InvalidRequestException error:
boto.exception.EmrResponseError: EmrResponseError: 400 Bad Request
<ErrorResponse xmlns="http://elasticmapreduce.amazonaws.com/doc/2009-03-31">
<Error>
<Type>Sender</Type>
<Code>InvalidRequestException</Code>
<Message>Marker 'what_should_i_use_here' is not valid.</Message>
</Error>
<RequestId>555b91bd-c122-11e3-8e31-abc75abdb39d</RequestId>
</ErrorResponse>
What should I provide in marker param so that I can properly paginate the query?
Thanks!
回答1:
Tried with
emr_object.describe_jobflows(states=["TERMINATED"])
and it works! This method returns all the clusters.
回答2:
You can pass in None the first time round.
If the ClusterListResult you get back has a marker attribute then you can pass that in later, e.g.
m=None
while True:
try:
cluster_list_result=emr_object.describe_jobflows(states=['TERMINATED'], marker=m)
.... Do whatever with cluster_list_result.clusters
m=cluster_list_result.marker # See if there are more
except AttributeError:
break
来源:https://stackoverflow.com/questions/23003190/unable-to-paginate-emr-cluster-using-boto