Introduction

Nutanix Repository on GitHub


My Personal Repository on GitHub

Self-healing infrastructure reduces downtime and manual intervention. With Ansible, Nutanix admins can detect VM or system issues and automatically trigger recovery actions, like powering on VMs, attaching NICs, or restoring from snapshots. This playbook outlines a closed-loop health monitor and repair engine.


Diagram: Self-Healing Automation Flow


Use Case

  • Detect powered-off critical VMs
  • Reattach NICs to isolated guests
  • Roll back to last known good snapshot

Playbook: self_heal_vm.yml

External Documentation:– name: Heal powered-off VMs
when: item.vms[0].power_state != vm.expected_power
loop: “{{ vm_status.results }}”
loop_control:
loop_var: item
block:
– name: Trigger VM reboot
when: item.item.heal_action == “reboot”
nutanix.ncp.vms:
name: “{{ item.item.name }}”
state: present
power_state: restart
cluster_name: “prod-cluster”*/10 * * * * ansible-playbook self_heal_vm.yml --ask-vault-pass -i inventory.yml

This playbook gives your Nutanix cluster a resilience layer. Use it to automatically recover VMs from failures, reduce incident response time, and support 24×7 environments without manual oversight.


Summary

– name: Restore from snapshot (fallback)
when: item.item.heal_action == “restore_snapshot”
debug:
msg: “TODO: Restore snapshot for {{ item.item.name }} – Add logic here.”

- name: Self-Healing Nutanix VM Automation
hosts: localhost
gather_facts: false
collections:
- nutanix.ncp
vars_files:
- nutanix_credentials.yml
vars:
monitored_vms:
- name: web01
expected_power: "on"
heal_action: "reboot"
- name: db01
expected_power: "on"
heal_action: "restore_snapshot"
tasks:

- name: Fetch VM states
loop: "{{ monitored_vms }}"
loop_control:
loop_var: vm
nutanix.ncp.vms_info:
name: "{{ vm.name }}"
register: vm_status

Similar Posts