NSX-T Edge Deletion Failed

November 6, 2021

Manually Cleaning Up Orphaned Edge Nodes

What? Cleaning Up Stale Edge Nodes from NSX-T Manager

This article will walk through 2 approaches to cleaning up the orphaned nodes, which include; API and manually cleaning up t he Corfu database. It’s important to remember that cleaning up the database yourself should really be done with support, however, I detail the process in this article.

Unfortunately, deleting the node with API did not work either.

This issue should not really impact production environments, due to be shared storage, vSphere HA, and any other mechanisms for VM restoration. This environment was a lab, and as a result the nodes were inaccessible from NSX-T Manager.

Next, whilst in this directory, issue the command in the snippet below, ensuring you change the hostname to suit your NSX-T Manager node IP.

Step 1: Using API to clean up the nodes with (DELETION FAILED)

There are other examples of utilizing corfu-browser, one example is here.

https://nsxtManagerFQDN/api/v1/transport-nodes/
https://nsxtManagerFQDN/api/v1/fabric/nodes

There are 2 API endpoints that can be leveraged to clean up the nodes, they are.

api transport-nodes not listing edge nodes

Repeat this process for all remaining stale entries.

Using the ID, issue the command java -Dlog4j.configurationFile=/opt/vmware/corfu-tools/corfu-browser-log4j2.xml -cp “/opt/vmware/corfu-tools/corfu-editor-1.0-jar-with-dependencies.jar:/opt/vmware/proton-tomcat/webapps/nsxapi/WEB-INF/lib/:/usr/tomcat/lib/” com.vmware.nsx.management.tools.corfu.CorfuEditorMain -hostname 192.168.63.55 -port 9000 removeEntries -role nsx-manager -objectType EdgeNode -uuid “aea3d092-6f60-4c87-901b-1d0f74b0ea66”

The first option in my case did not show the orphaned Edge nodes, only a host transport node was displayed.

The second URI displayed the orphaned Edge nodes.

Step 2: Cleaning up stale Edge nodes using Corfu-browser

Now, using the ID highlighted in the image, you should be able to delete the node by issuing a DELETE request to https://nsxtManagerFQDN/api/v1/fabric/nodes /nodeUUID. As can be seen in the image below.

Step 2a: Log into NSX-T Manager as root

java -Dlog4j.configurationFile=/opt/vmware/corfu-tools/corfu-browser-log4j2.xml -cp "/opt/vmware/corfu-tools/corfu-editor-1.0-jar-with-dependencies.jar:/opt/vmware/proton-tomcat/webapps/nsxapi/WEB-INF/lib/*:/usr/tomcat/lib/*" com.vmware.nsx.management.tools.corfu.CorfuEditorMain -hostname 192.168.63.55 -port 9000 printTable -role nsx-manager -objectType EdgeNode > /tmp/EdgeNodeIDs

root@nsxmgr:/opt/vmware/corfu-tools# java -Dlog4j.configurationFile=/opt/vmware/corfu-tools/corfu-browser-log4j2.xml -cp "/opt/vmware/corfu-tools/corfu-editor-1.0-jar-with-dependencies.jar:/opt/vmware/proton-tomcat/webapps/nsxapi/WEB-INF/lib/*:/usr/tomcat/lib/*" com.vmware.nsx.management.tools.corfu.CorfuEditorMain -hostname 192.168.63.55 -port 9000 removeEntries -role nsx-manager -objectType EdgeNode -uuid "aea3d092-6f60-4c87-901b-1d0f74b0ea66" SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Overriding NSX service type to: nsx-manager - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Table mapping mechanism is enabled. - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Processing all classes in package : com.vmware.nsx.management Reflections took 2975 ms to scan 105 urls, producing 2333 keys and 18905 values - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Processing all classes in package : com.vmware.nsx.pace Reflections took 139 ms to scan 2 urls, producing 52 keys and 123 values - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Processing all classes in package : com.vmware.nsxapi Reflections took 415 ms to scan 3 urls, producing 166 keys and 9685 values - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Processing all classes in package : com.vmware.vmc Reflections took 256 ms to scan 696 urls, producing 0 keys and 0 values - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Processing all classes in package : com.vmware.nsx.csm Reflections took 254 ms to scan 696 urls, producing 0 keys and 0 values - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] ObjectTypeRegistry is initialized. No registered metrics logger provided. Corfu runtime version source(ee70bb3) initialized. Bootstrap Layout Servers [192.168.63.55:9000] setCacheDisabled: Deprecated, please set parameters instead enableTls: Deprecated, please set parameters instead - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Trying to connect with TLS support connect: runtime parameters CorfuRuntime.CorfuRuntimeParameters(maxWriteSize=2147483647, bulkReadSize=10, fastLoaderTimeout=PT30M, holeFillRetry=10, holeFillRetryThreshold=PT1S, holeFillTimeout=PT10S, cacheDisabled=true, maxCacheEntries=0, maxCacheWeight=0, cacheConcurrencyLevel=0, cacheExpiryTime=9223372036854775807, followBackpointersEnabled=false, holeFillingDisabled=false, writeRetry=5, trimRetry=2, checkpointRetries=5, streamBatchSize=10, checkpointReadBatchSize=5, runtimeGCPeriod=PT20M, clusterId=null, systemDownHandlerTriggerLimit=60, layoutServers=[], invalidateRetry=5, priorityLevel=HIGH, codecType=ZSTD, metricsEnabled=true) Connecting to Corfu server instance, layout servers=[192.168.63.55:9000] Construct ssl context based on the following information: Key store file path: /config/cluster-manager/cluster-manager/private/keystore.jks. Key store password file path: /config/cluster-manager/cluster-manager/private/keystore.password. Trust store file path: /config/cluster-manager/cluster-manager/public/truststore.jks. Trust store password file path: /config/cluster-manager/cluster-manager/public/truststore.password. Connect Async 192.168.63.55:9000 channelActive: Outgoing connection established to: /192.168.63.55:9000 from id=/192.168.63.55:33962 userEventTriggered: unhandled event SslHandshakeCompletionEvent(SUCCESS) channelRead: Handshake Response received. Removing readTimeoutHandler from pipeline. channelRead: node id matching is not requested by client. channelRead: Handshake succeeded. Server Corfu Version: [source(ee70bb3)] channelRead: Removing handshake handler from pipeline. Unavailable or unrecognised attach API : java.lang.ClassNotFoundException: com.sun.tools.attach.VirtualMachine Detected JVM data model settings of: 64-Bit HotSpot JVM with Compressed OOPs Connected to new cluster gTXV62MwQCic9iNWjY9fIQ connect: client version source(ee70bb3), server version is source(ee70bb3) - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Successfully connected to Corfu server(s) '192.168.63.55:9000'. ObjectBuilder: open Corfu stream nsx-manager Node 2db0 id f3cb5120-7734-3d61-bf5f-765d58f3e026 ObjectBuilder: open Corfu stream string-audit id 5a3b0d28-4435-3c1a-bd1b-189e1ae2066f About to remove the following entries: ============================================================ com.vmware.nsx.management.common.IdentifierImpl@3da0180a[ objectType=Node, stringId=<null>, uuid=aea3d092-6f60-4c87-901b-1d0f74b0ea66 ] ============================================================ ******************************************** ******************************************** PRESS ANY KEY TO CONTINUE OR CTRL-C TO ABORT ******************************************** ********************************************

Successfully removed 1 entries.

First you will need to use a terminal client and log into the NSX-T Manager as root. Once logged in, navigate to /opt/vmware/corfu-tools.

Once again, ensure you change the hostname and uuid to suit your environment. Once run, you should see output similar to below.

I should also note that the DELETION FAILED state only resulted after attempting to delete the Edge node’s in the UI, refer to the image below.

The next option I used to clean up the database was using Corfu-browser. This is generally not recommended without support, however, as it was a lab I pushed on.

Notice in the request, I got a 200 response, so in theory the node should have been deleted. However, when searching for the node in NSX-T Manager, the node still appears.

Conclusion

From time to time you may face stale corfu database entries for Edge nodes in NSX-T Manager, it’s important you attempt to clean them up using either the UI or API before jumping straight into the database. Hopefully this article has assisted you with cleaning up Edge nodes stuck in the DELETION FAILED state.

I recently ran into an issue where a host and its underlying storage failed in one of my environments. This host had NSX-T Edge nodes residing on it, and when the host and storage failed, NSX-T Manager lost access to the Edge nodes on it, and the Edge nodes were stuck in a DELETION FAILED state.

VALUE: com.vmware.nsx.management.edge.node.model.EdgeNode@5517e36b[ allocationList=<null>, pendingMsgBusRegistration=false, autoDeployed=true, nodeUserSettings=com.vmware.nsx.management.edge.node.model.EdgeNodeUserSetting s@157544e5[ cliPassword=password, rootPassword=password, cliUsername=admin, auditUsername=<null>, auditPassword=<null> ], nodeConfigSettings=com.vmware.nsx.management.edge.node.model.EdgeNodeConfigSet tings@1c0db15[ managementPortSubnets={ com.vmware.nsx.management.edge.lrouter.ports.model.SubnetModel@713cdd61[ prefixLength=24, ipAddresses={ 192.168.63.60 }, ipConfigs=java.util.ArrayList@494e9f73{

}, raPrefixTime=<null> ] }, hostname=en4-mgmt.shank.com, defaultGatewayAddresses={ 192.168.63.1 }, searchDomains={ shank.com }, ntpServers={ 192.168.63.101 }, dnsServers={ 192.168.63.101 }, formFactor=MEDIUM, enableSsh=true, allowSshRootLogin=true, syslogServers=<null> ], vsphereConfig=com.vmware.nsx.management.edge.node.model.VsphereDeploymentConfi g@7aff0c30[ vcId=bd0270b9-7ee3-4a3e-b3d3-9d79f5888204, managementNetworkId=dvportgroup-23005, computeId=domain-c19037, storageId=datastore-22030, dataNetworkIds={ dvportgroup-74021, dvportgroup-74021 }, hostId=host-22026, computeFolderId=<null>, advancedConfiguration=<null> ], reservationInfo=com.vmware.nsx.management.edge.node.model.ReservationInfo@4af5 8e7b[ memoryReservation=com.vmware.nsx.management.edge.node.model.MemoryReservatio n@5d33742[ reservationPercentage=100 ], cpuReservation=com.vmware.nsx.management.edge.node.model.CPUReservation@7cf3 544[ reservationInShares=HIGH_PRIORITY, reservationInMhz=0 ] ], nodeType=com.vmware.nsx.management.fabricnode.common.FabricNodeTypeEnum@6818cc fe[ name=EdgeNode, name=EDGE_NODE, ordinal=1 ], externalId=aea3d092-6f60-4c87-901b-1d0f74b0ea66, ipAddresses=java.util.ArrayList@4087fff9{ 192.168.63.60 }, tags=<null>, displayName=edge2, description=, createUser=admin, lastModifiedUser=admin, createTime=1628460474598, lastModifiedTime=1632872454818, systemResourceFlag=false, revision=6, touched=false, id=com.vmware.nsx.management.common.IdentifierImpl@37539d92[ objectType=Node, stringId=<null>, uuid=aea3d092-6f60-4c87-901b-1d0f74b0ea66 ], nonMonotonicRevision=6 ]

The field we require can also be seen in the API calls run earlier, however, there it is just called id. You can also get the ID from the NSX-T Manager UI.

NSX-T Edge Deletion Failed