Terraform stucks when instance_count is more than 2 while using remote-exec provisioner

蹲街弑〆低调 提交于 2019-12-24 04:29:08

问题


  • I am trying to provision multiple Windows EC2 instance with Terraform's remote-exec provisioner using null_resource.

$ terraform -v Terraform v0.12.6 provider.aws v2.23.0 provider.null v2.1.2

  • Originally, I was working with three remote-exec provisioners (Two of them involved rebooting the instance) without null_resource and for a single instance, everything worked absolutely fine.
  • I then needed to increase the count and based on several links, ended up using null_resource. So, I have reduced the issue to the point where I am not even able to run one remote-exec provisioner for more than 2 Windows EC2 instances using null_resource.

Terraform template to reproduce the error message:

//VARIABLES

variable "aws_access_key" {
  default = "AK"
}
variable "aws_secret_key" {
  default = "SAK"
}
variable "instance_count" {
  default = "3"
}
variable "username" {
  default = "Administrator"
}
variable "admin_password" {
  default = "Password"
}
variable "instance_name" {
  default = "Testing"
}
variable "vpc_id" {
  default = "vpc-id"
}

//PROVIDERS
provider "aws" {
  access_key = "${var.aws_access_key}"
  secret_key = "${var.aws_secret_key}"
  region     = "ap-southeast-2"
}

//RESOURCES
resource "aws_instance" "ec2instance" {
  count         = "${var.instance_count}"
  ami           = "Windows AMI"
  instance_type = "t2.xlarge"
  key_name      = "ec2_key"
  subnet_id     = "subnet-id"
  vpc_security_group_ids = ["${aws_security_group.ec2instance-sg.id}"]
  tags = {
    Name = "${var.instance_name}-${count.index}"
  }
}

resource "null_resource" "nullresource" {
  count = "${var.instance_count}"
  connection {
    type     = "winrm"
    host     = "${element(aws_instance.ec2instance.*.private_ip, count.index)}"
    user     = "${var.username}"
    password = "${var.admin_password}"
    timeout  = "10m"
  }
   provisioner "remote-exec" {
     inline = [
       "powershell.exe Write-Host Instance_No=${count.index}"
     ]
   }
//   provisioner "local-exec" {
//     command = "powershell.exe Write-Host Instance_No=${count.index}"
//   }
//   provisioner "file" {
//       source      = "testscript"
//       destination = "D:/testscript"
//   }
}
resource "aws_security_group" "ec2instance-sg" {
  name        = "${var.instance_name}-sg"
  vpc_id      = "${var.vpc_id}"


//   RDP
  ingress {
    from_port   = 3389
    to_port     = 3389
    protocol    = "tcp"
    cidr_blocks = ["CIDR"]
    }

//   WinRM access from the machine running TF to the instance
  ingress {
    from_port   = 5985
    to_port     = 5985
    protocol    = "tcp"
    cidr_blocks = ["CIDR"]
    }

  tags = {
    Name        = "${var.instance_name}-sg"
  }

}
//OUTPUTS
output "private_ip" {
  value = "${aws_instance.ec2instance.*.private_ip}"
}

Observations:

  • With one remote-exec provisioner, it works fine if count is set to 1 or 2. With count 3, it's unpredictable that all the provisioners will run everytime on all the instances. However one thing is for sure that Terraform never completes and does not show the output variables. It keeps showing "null_resource.nullresource[count.index]: Still creating..."
  • For the local-exec provisioner - Everything works fine. Tested with count's value as 1, 2 and 7.
  • For file provisioner its working fine for 1, 2 and 3 however does not finish for 7 but the file was copied on all the 7 instances. It keeps showing "null_resource.nullresource[count.index]: Still creating..."
  • Also, in every attempt, remote-exec provisioner is able to connect to the instances irrespective of count's value and it's just that, it's doesnt trigger the inline command and randomly chooses to skip that and starts showing "Still creating..." message.
  • I have been stuck with this issue for quite some time now. Couldnt find anything significant in debug logs as well. I know Terraform is not recommended to be used as a config mgmt tool however, everything's working fine even with complex provisioning scripts if the instance count is just 1 (Even without null_resource) which indicates that it should be easily possible for Terraform to handle such a basic provisioning requirement.
  • TF_DEBUG logs:
  • count=2, TF completes successfully and shows Apply complete!.
  • count=3, TF runs the remote-exec on all the three instances however does not complete and doesn't not show the outputs variables. Stuck at "Still creating..."
  • count=3, TF runs the remote-exec only on two instances and skips on nullresource[1] , does not complete and doesn't not show the outputs variables. Stuck at "Still creating..."
  • Any pointers will be greatly appreciated!

回答1:


Update: what eventually did the trick was downgrading Terraform to v11.14 as per this issue comment.

A few things you can try:

  1. Inline remote-exec:
resource "aws_instance" "ec2instance" {
  count         = "${var.instance_count}"
  # ...
  provisioner "remote-exec" {
    connection {
      # ...
    }
    inline = [
      # ...
    ]
  }
}

Now you can refer to self inside the connection block to get the instance's private IP.

  1. Add triggers to null_resource:
resource "null_resource" "nullresource" {
  triggers {
    host    = "${element(aws_instance.ec2instance.*.private_ip, count.index)}" # Rerun when IP changes
    version = "${timestamp()}" # ...or rerun every time
  }
  # ...
}

You can use the triggers attribute to recreate null_resource and thus re-execute remote-exec.



来源:https://stackoverflow.com/questions/57368506/terraform-stucks-when-instance-count-is-more-than-2-while-using-remote-exec-prov

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!