Contribute to Open Source

Investigating a problem with vSphere Provider for Terraform

Stéphane Este-Gracias
6 min readMar 27, 2021
Always Room to Grow
Photo by Kyle Glenn on Unsplash

Many people on my network ask me, “How to contribute to Open Source?”.
A good entry point is FreeCodeCamp’s list of resources for new people to contribute to Open Source.

For sure, you should find an interest in contributing to a project. An interesting article about this topic is The key to open source sustainability is good old-fashioned self-interest.

Here, my interest was to find the root cause of an issue when using the vSphere Provider for Terraform. The investigation that followed this issue leads me to share it to show that the contribution to an Open Source project can go through something other than code. Here, my contribution was to support the progress of an issue.

You don’t need to code to contribute to Open Source!

🕵️ The case

I’m using Infrastructure as Code employing HashiCorp Packer to create machine images and HashiCorp Terraform to deploy these images on various infrastructures such as VMware SDDC with the vSphere Provider. After upgrading the provider version, I encountered the following error when applying my Terraform configuration to create a set of VMs.

Invalid operation for device ‘0’

vsphere_virtual_machine.vm: Creating...
vsphere_virtual_machine.vm: Still creating... [10s elapsed]
Error: error reconfiguring virtual machine: error reconfiguring virtual machine: Invalid operation for device '0'.

📚RTFM

Probably, I need to read either the changelog or the documentation carefully to check if some breaking changes happened. I don’t find any associated information after reading them.

🤔 Has anyone else ever had the problem?

The first instinct is to search on websites like Reddit, Stack Overflow… to find someone who has the same problem, or search on the issue tracker from the project itself.

⚠️ The issue tracker is the preferred channel for bug reports and features requests but not for your personal support requests.

After searching, I find the following issue that explained the same problem: https://github.com/hashicorp/terraform-provider-vsphere/issues/1016

🕵️ Investigations

After reading this issue, the problem seems to appear after the 1.15.0 release. Let’s force the version to 1.15.0 in my Terraform configuration.

terraform {
required_providers {
vsphere = {
source = "hashicorp/vsphere"
version = "1.15.0"
}
}

Launch the following commands to download and initialize the selected version and apply the Terraform configuration.

$ terraform init -upgradeInitializing the backend...Initializing provider plugins...
- Finding hashicorp/vsphere versions matching "1.15.0"...
- Installing hashicorp/vsphere v1.15.0...
- Installed hashicorp/vsphere v1.15.0 (signed by HashiCorp)
...
$ terraform apply

Voila, it works well, but I need some features that are implemented only in the next releases. So, I have to investigate a step further. So, I need to destroy created resources with terraform destroy command beforehand.

🧐 Debug logs (1.15.0)

Is there any way to activate debug logs on this provider?
Let’s have a look at the vSphere provider’s documentation.
Yes, there is! The client_debug option exists. So, update the provider’s configuration to add this option.

provider "vsphere" {
user = var.vsphere-username
password = var.vsphere-password
vsphere_server. = var.vsphere-server
allow_unverified_ssl = var.vsphere-insecure
client_debug = true
}

After running terraform apply, the $HOME/.govmomi/debug folder contains a timestamped folder per session.

Each folder contains an XML file per request with the related response,

$ ls -1 $HOME/.govmomi/debug/YYYY-MM-DDTHH-MM-SS.SSSSSS
1-0001.req.headers
1-0001.req.xml
1-0001.res.headers
1-0001.res.xml
1-0002.req.headers
1-0002.req.xml
1-0002.res.headers
1-0002.res.xml
1-0003.req.headers
1-0003.req.xml
1-0003.res.headers
1-0003.res.xml
...
1-0107.req.headers
1-0107.req.xml
1-0107.res.headers
1-0107.res.xml
1-client.log

and a summary file 1-client.log.

# 1-client.log (1.15.0)
...
0042: 132ms (*methods.WaitForUpdatesExBody)
0043: 36ms (*methods.DestroyPropertyCollectorBody)
0044: 34ms (*methods.RetrievePropertiesBody)
0045: 61ms (*methods.RetrievePropertiesBody)
0046: 33ms (*methods.DVSManagerLookupDvPortGroupBody)
0047: 47ms (*methods.RetrievePropertiesBody)
0048: 46ms (*methods.ReconfigVM_TaskBody)
...

These logs will serve as a reference for the next step.

🧐 Debug logs (1.16.0)

In the changelog, I find that the next version after 1.15.0 is 1.16.0.
Let’s follow the same instructions as before, but with the 1.16.0 version instead.

As expected, the error occurs, but I got some debug logs now!

vsphere_virtual_machine.vm: Creating...
vsphere_virtual_machine.vm: Still creating... [10s elapsed]
Error: error reconfiguring virtual machine: error reconfiguring virtual machine: Invalid operation for device '0'.

Let’s search Invalid operation in the SOAP messages.

$ cd $HOME/.govmomi/debug/YYYY-MM-DDTHH-MM-SS.SSSSSS
$ grep "Invalid operation" *
1-0056.res.xml:<WaitForUpdatesExResponse>....

Found in 1–0056.res.xml file!
Let’s open it (only relevant information is shown).

# 1–00056.res.xml
<WaitForUpdatesExResponse xmlns="urn:vim25">
...
<val xsi:type="TaskInfo">
<key>task-669130</key>
<task type="Task">task-669130</task>
<description>
<key>com.vmware.vim.vpxd.vpx.vmprov.ReconfigureVm</key>
<message>Reconfiguring Virtual Machine on destination host</message>
</description>
<name>ReconfigVM_Task</name>
<descriptionId>VirtualMachine.reconfigure</descriptionId>
...
<error>
<fault xsi:type="InvalidDeviceOperation">
<property>virtualDeviceSpec.device</property>
<deviceIndex>0</deviceIndex>
<badOp>edit</badOp>
</fault>
<localizedMessage>Invalid operation for device &apos;0&apos;.</localizedMessage>
</error>
<reason xsi:type="TaskReasonUser">
<userName>xxxx</userName>
</reason>
</val>
</WaitForUpdatesExResponse>

Basically, 1–00056.res.xml response shows an error from the WaitForUpdatesExResponse of the ReconfigVM_Task request.

Let’s search ReconfigVM_Task request in the SOAP messages from 1.16.0.

$ grep "ReconfigVM_Task" *
1-0052.req.xml:<Envelope>...

Let’s open the corresponding file.
Four deviceChange are present, and the first deviceChange (index 0) is for Hard disk 1.

# 1–0052.req.xml
<ReconfigVM_Task ...>
<_this type="VirtualMachine">vm-xxxx</_this>
<spec>
...
<deviceChange ...>
<operation>edit</operation>
<device ...>
<key>2000</key>
<deviceInfo XMLSchema-instance:type="Description">
<label>Hard disk 1</label>
<summary>20,480,000 KB</summary>
</deviceInfo>
...
</device>
</deviceChange>
<deviceChange ...>
<operation>edit</operation>
<device XMLSchema-instance:type="VirtualVmxnet3">
<key>4000</key>
<deviceInfo XMLSchema-instance:type="Description">
<label>Network adapter 1</label>
<summary>DVSwitch: ...</summary>
</deviceInfo>
...
</device>
</deviceChange>
<deviceChange ...>
<operation>remove</operation>
<device ...>
<key>3000</key>
<deviceInfo XMLSchema-instance:type="Description">
<label>CD/DVD drive 1</label>
<summary>Remote device</summary>
</deviceInfo>
...
</device>
</deviceChange>
<deviceChange ...>
...
</deviceChange>
....
</spec>
</ReconfigVM_Task>

Let’s search ReconfigVM_Task request in the SOAP messages from 1.15.0 debug logs to compare with the request done for 1.16.0.
After checking 1-client.log (1.15.0), the request is 0048.

# 1-client.log (1.15.0)
...
0048: 46ms (*methods.ReconfigVM_TaskBody)
...

Let’s open the corresponding request file.
Only three deviceChange are present, and they’re the same as the last of the request done from 1.16.0.

#1-0048-req.xml
<ReconfigVM_Task ...>
<_this type="VirtualMachine">vm-xxxx</_this>
<spec>
...
<deviceChange ...>
<operation>edit</operation>
<device XMLSchema-instance:type="VirtualVmxnet3">
<key>4000</key>
<deviceInfo XMLSchema-instance:type="Description">
<label>Network adapter 1</label>
<summary>DVSwitch: ...</summary>
</deviceInfo>
...
</device>
</deviceChange>
<deviceChange ...>
<operation>remove</operation>
<device ...>
<key>3000</key>
<deviceInfo XMLSchema-instance:type="Description">
<label>CD/DVD drive 1</label>
<summary>Remote device</summary>
</deviceInfo>
...
</device>
</deviceChange>
<deviceChange ...>
...
</deviceChange>
....
</spec>
</ReconfigVM_Task>

🤔 Suspect found: Hard disk 1 reconfiguration

deviceChange parameter on Hard disk 1 is only present in 1.16.0, and it’s the first (index ‘0’) in the ReconfigVM_Task request.

So, I understand better the error message.

Error: error reconfiguring virtual machine: error reconfiguring virtual machine: Invalid operation for device '0'.

The first device in the ReconfigVM_Task request is invalid, and it’s the Hard disk 1.

💡Solution

The changelog for the 1.16.0 version shows
New Data Source: storage_policy.

Let’s go to the Data Source: storage policy documentation.

The vsphere_storage_policy data source can be used to discover the UUID of a vSphere storage policy. This can then be used with resources or data sources that require a storage policy.

In virtual_machine documentation, storage_policy_id appears at VM and disk level. Let’s add the required data source and storage_policy_id in the Terraform configuration file.

data "vsphere_storage_policy" "vm" {
name = <VM storage policy name>
}

data "vsphere_storage_policy" "disk" {
name = <Disk storage policy name>
}


resource "vsphere_virtual_machine" "vm" {
name = "vm"
...
storage_policy_id = data.vsphere_storage_policy.vm.id

disk {
...
storage_policy_id = data.vsphere_storage_policy.disk.id
}
}

After running terraform applywith this new configuration, the VM creation is successful. After checking the debug logs, the ReconfigVM_Task request is the same as 1.15.0.

Photo by Tim Mossholder on Unsplash

Conclusion

During this investigation, I helped myself and others by sharing 😃 the solution within the issue tracker. On top of this, I discovered and learned new things:

  • By reading the changelog and the documentation, I discovered useful improvements and new features on the vSphere Provider such as OVA/OVF deployment, vApp properties support
  • govmomi project supported by VMware, including an interesting vSphere API mock framework called vcsim. I will investigate and check if it could be useful for testing our Terraform modules.
  • How to debug Terraform plugins even without the need of coding a single line of code.
Photo by Markus Winkler on Unsplash

Resources

--

--