Contribute to Open Source
Investigating a problem with vSphere Provider for Terraform
Many people on my network ask me, “How to contribute to Open Source?”.
A good entry point is FreeCodeCamp’s list of resources for new people to contribute to Open Source.
For sure, you should find an interest in contributing to a project. An interesting article about this topic is The key to open source sustainability is good old-fashioned self-interest.
Here, my interest was to find the root cause of an issue when using the vSphere Provider for Terraform. The investigation that followed this issue leads me to share it to show that the contribution to an Open Source project can go through something other than code. Here, my contribution was to support the progress of an issue.
You don’t need to code to contribute to Open Source!
🕵️ The case
I’m using Infrastructure as Code employing HashiCorp Packer to create machine images and HashiCorp Terraform to deploy these images on various infrastructures such as VMware SDDC with the vSphere Provider. After upgrading the provider version, I encountered the following error when applying my Terraform configuration to create a set of VMs.
Invalid operation for device ‘0’
vsphere_virtual_machine.vm: Creating...
vsphere_virtual_machine.vm: Still creating... [10s elapsed]Error: error reconfiguring virtual machine: error reconfiguring virtual machine: Invalid operation for device '0'.
📚RTFM
Probably, I need to read either the changelog or the documentation carefully to check if some breaking changes happened. I don’t find any associated information after reading them.
🤔 Has anyone else ever had the problem?
The first instinct is to search on websites like Reddit, Stack Overflow… to find someone who has the same problem, or search on the issue tracker from the project itself.
⚠️ The issue tracker is the preferred channel for bug reports and features requests but not for your personal support requests.
After searching, I find the following issue that explained the same problem: https://github.com/hashicorp/terraform-provider-vsphere/issues/1016
🕵️ Investigations
After reading this issue, the problem seems to appear after the 1.15.0 release. Let’s force the version to 1.15.0 in my Terraform configuration.
terraform {
required_providers {
vsphere = {
source = "hashicorp/vsphere"
version = "1.15.0"
}
}
Launch the following commands to download and initialize the selected version and apply the Terraform configuration.
$ terraform init -upgradeInitializing the backend...Initializing provider plugins...
- Finding hashicorp/vsphere versions matching "1.15.0"...
- Installing hashicorp/vsphere v1.15.0...
- Installed hashicorp/vsphere v1.15.0 (signed by HashiCorp)
...$ terraform apply
Voila, it works well, but I need some features that are implemented only in the next releases. So, I have to investigate a step further. So, I need to destroy created resources with terraform destroy
command beforehand.
🧐 Debug logs (1.15.0)
Is there any way to activate debug logs on this provider?
Let’s have a look at the vSphere provider’s documentation.
Yes, there is! The client_debug
option exists. So, update the provider’s configuration to add this option.
provider "vsphere" {
user = var.vsphere-username
password = var.vsphere-password
vsphere_server. = var.vsphere-server
allow_unverified_ssl = var.vsphere-insecure
client_debug = true
}
After running terraform apply
, the $HOME/.govmomi/debug
folder contains a timestamped folder per session.
Each folder contains an XML file per request with the related response,
$ ls -1 $HOME/.govmomi/debug/YYYY-MM-DDTHH-MM-SS.SSSSSS
1-0001.req.headers
1-0001.req.xml
1-0001.res.headers
1-0001.res.xml
1-0002.req.headers
1-0002.req.xml
1-0002.res.headers
1-0002.res.xml
1-0003.req.headers
1-0003.req.xml
1-0003.res.headers
1-0003.res.xml
...
1-0107.req.headers
1-0107.req.xml
1-0107.res.headers
1-0107.res.xml
1-client.log
and a summary file 1-client.log
.
# 1-client.log (1.15.0)
...
0042: 132ms (*methods.WaitForUpdatesExBody)
0043: 36ms (*methods.DestroyPropertyCollectorBody)
0044: 34ms (*methods.RetrievePropertiesBody)
0045: 61ms (*methods.RetrievePropertiesBody)
0046: 33ms (*methods.DVSManagerLookupDvPortGroupBody)
0047: 47ms (*methods.RetrievePropertiesBody)
0048: 46ms (*methods.ReconfigVM_TaskBody)
...
These logs will serve as a reference for the next step.
🧐 Debug logs (1.16.0)
In the changelog, I find that the next version after 1.15.0 is 1.16.0.
Let’s follow the same instructions as before, but with the 1.16.0 version instead.
As expected, the error occurs, but I got some debug logs now!
vsphere_virtual_machine.vm: Creating...
vsphere_virtual_machine.vm: Still creating... [10s elapsed]Error: error reconfiguring virtual machine: error reconfiguring virtual machine: Invalid operation for device '0'.
Let’s search Invalid operation
in the SOAP messages.
$ cd $HOME/.govmomi/debug/YYYY-MM-DDTHH-MM-SS.SSSSSS
$ grep "Invalid operation" *
1-0056.res.xml:<WaitForUpdatesExResponse>....
Found in 1–0056.res.xml
file!
Let’s open it (only relevant information is shown).
# 1–00056.res.xml
<WaitForUpdatesExResponse xmlns="urn:vim25">
...
<val xsi:type="TaskInfo">
<key>task-669130</key>
<task type="Task">task-669130</task>
<description>
<key>com.vmware.vim.vpxd.vpx.vmprov.ReconfigureVm</key>
<message>Reconfiguring Virtual Machine on destination host</message>
</description>
<name>ReconfigVM_Task</name>
<descriptionId>VirtualMachine.reconfigure</descriptionId>
...
<error>
<fault xsi:type="InvalidDeviceOperation">
<property>virtualDeviceSpec.device</property>
<deviceIndex>0</deviceIndex>
<badOp>edit</badOp>
</fault>
<localizedMessage>Invalid operation for device '0'.</localizedMessage>
</error>
<reason xsi:type="TaskReasonUser">
<userName>xxxx</userName>
</reason>
</val>
</WaitForUpdatesExResponse>
Basically, 1–00056.res.xml
response shows an error from the WaitForUpdatesExResponse
of the ReconfigVM_Task
request.
Let’s search ReconfigVM_Task
request in the SOAP messages from 1.16.0.
$ grep "ReconfigVM_Task" *
1-0052.req.xml:<Envelope>...
Let’s open the corresponding file.
Four deviceChange
are present, and the first deviceChange
(index 0) is for Hard disk 1
.
# 1–0052.req.xml
<ReconfigVM_Task ...>
<_this type="VirtualMachine">vm-xxxx</_this>
<spec>
...
<deviceChange ...>
<operation>edit</operation>
<device ...>
<key>2000</key>
<deviceInfo XMLSchema-instance:type="Description">
<label>Hard disk 1</label>
<summary>20,480,000 KB</summary>
</deviceInfo>
...
</device>
</deviceChange>
<deviceChange ...>
<operation>edit</operation>
<device XMLSchema-instance:type="VirtualVmxnet3">
<key>4000</key>
<deviceInfo XMLSchema-instance:type="Description">
<label>Network adapter 1</label>
<summary>DVSwitch: ...</summary>
</deviceInfo>
...
</device>
</deviceChange>
<deviceChange ...>
<operation>remove</operation>
<device ...>
<key>3000</key>
<deviceInfo XMLSchema-instance:type="Description">
<label>CD/DVD drive 1</label>
<summary>Remote device</summary>
</deviceInfo>
...
</device>
</deviceChange>
<deviceChange ...>
...
</deviceChange>
....
</spec>
</ReconfigVM_Task>
Let’s search ReconfigVM_Task
request in the SOAP messages from 1.15.0 debug logs to compare with the request done for 1.16.0.
After checking 1-client.log (1.15.0)
, the request is 0048
.
# 1-client.log (1.15.0)
...
0048: 46ms (*methods.ReconfigVM_TaskBody)
...
Let’s open the corresponding request file.
Only three deviceChange
are present, and they’re the same as the last of the request done from 1.16.0.
#1-0048-req.xml
<ReconfigVM_Task ...>
<_this type="VirtualMachine">vm-xxxx</_this>
<spec>
...
<deviceChange ...>
<operation>edit</operation>
<device XMLSchema-instance:type="VirtualVmxnet3">
<key>4000</key>
<deviceInfo XMLSchema-instance:type="Description">
<label>Network adapter 1</label>
<summary>DVSwitch: ...</summary>
</deviceInfo>
...
</device>
</deviceChange>
<deviceChange ...>
<operation>remove</operation>
<device ...>
<key>3000</key>
<deviceInfo XMLSchema-instance:type="Description">
<label>CD/DVD drive 1</label>
<summary>Remote device</summary>
</deviceInfo>
...
</device>
</deviceChange>
<deviceChange ...>
...
</deviceChange>
....
</spec>
</ReconfigVM_Task>
🤔 Suspect found: Hard disk 1 reconfiguration
deviceChange
parameter on Hard disk 1
is only present in 1.16.0, and it’s the first (index ‘0’) in the ReconfigVM_Task
request.
So, I understand better the error message.
Error: error reconfiguring virtual machine: error reconfiguring virtual machine: Invalid operation for device '0'.
The first device in the ReconfigVM_Task
request is invalid, and it’s the Hard disk 1
.
💡Solution
The changelog for the 1.16.0 version shows
New Data Source: storage_policy.
Let’s go to the Data Source: storage policy documentation.
The
vsphere_storage_policy
data source can be used to discover the UUID of a vSphere storage policy. This can then be used with resources or data sources that require a storage policy.
In virtual_machine documentation, storage_policy_id
appears at VM and disk level. Let’s add the required data source and storage_policy_id
in the Terraform configuration file.
data "vsphere_storage_policy" "vm" {
name = <VM storage policy name>
}
data "vsphere_storage_policy" "disk" {
name = <Disk storage policy name>
}
resource "vsphere_virtual_machine" "vm" {
name = "vm"
...
storage_policy_id = data.vsphere_storage_policy.vm.id
disk {
...
storage_policy_id = data.vsphere_storage_policy.disk.id
}
}
After running terraform apply
with this new configuration, the VM creation is successful. After checking the debug logs, the ReconfigVM_Task
request is the same as 1.15.0.
Conclusion
During this investigation, I helped myself and others by sharing 😃 the solution within the issue tracker. On top of this, I discovered and learned new things:
- By reading the changelog and the documentation, I discovered useful improvements and new features on the vSphere Provider such as OVA/OVF deployment, vApp properties support…
- govmomi project supported by VMware, including an interesting vSphere API mock framework called vcsim. I will investigate and check if it could be useful for testing our Terraform modules.
- How to debug Terraform plugins even without the need of coding a single line of code.