TL;DR
Practical implication: if fleet services are impaired, governance and workflows degrade, but the instance-level control planes do not magically disappear.
That shared mental model is what lets you scale without scaling confusion.
- Fleet-level services: centralized operations, lifecycle for management components, automation, and SSO integration.
- Instance management planes: SDDC Manager, management vCenter, management NSX, plus the vCenter and NSX that belong to each workload domain.
Solutions:
- VCF Installer: 9.0.1.0 build 24962180 (required to deploy VCF 9.0.0.0 components)
- SDDC Manager: 9.0.0.0 build 24703748
- vCenter: 9.0.0.0 build 24755230
- ESXi: 9.0.0.0 build 24755229
- NSX: 9.0.0.0 build 24733065
- VCF Operations: 9.0.0.0 build 24695812
- VCF Operations fleet management: 9.0.0.0 build 24695816
- VCF Automation: 9.0.0.0 build 24701403
- VCF Identity Broker: 9.0.0.0 build 24695128
Architecture Diagram

Solutions:
- Fleet-level management components give you centralized governance, inventory, and services.
- Instance management planes are not shared. Each instance still owns its own SDDC Manager, vCenter, and NSX boundaries.
Table of Contents
- Scenario
- Assumptions
- Core vocabulary recap
- Core concept: separate fleet services from instance management planes
- What runs where in VCF 9.0 GA
- Who owns what
- Day-0, day-1, day-2 map
- Identity and SSO boundaries that actually matter
- Topology patterns for single site, two sites, and multi-region
- Failure domain analysis
- Operational runbook snapshot
- Anti-patterns
- Summary and takeaways
- Conclusion
Scenario
You need architects, operators, and leadership to agree on:
- What VCF 9.0 actually manages.
- What is centralized at fleet level vs isolated per instance or domain.
- Who owns which parts of lifecycle, identity, and day-2 operations.
Assumptions
- You are deploying greenfield VCF 9.0 GA (core components at 9.0.0.0, deployed via the documented installer level).
- You deploy both VCF Operations and VCF Automation from day-1.
- You want patterns for:
- Single site
- Two sites in one region
- Multi-region
- You need guidance for both:
- Shared identity
- Separate identity and SSO boundaries for regulated isolation
Core vocabulary recap
B) Cross-instance SSO with multiple Identity Brokers in one fleet
- VCF private cloud: the highest-level management and consumption boundary; can contain one or more fleets.
- VCF fleet: managed by one set of fleet-level management components (notably VCF Operations and VCF Automation); contains one or more instances.
- VCF instance: a discrete VCF deployment containing a management domain and optionally workload domains.
- VCF domain: a lifecycle and isolation boundary inside an instance (management domain and VI workload domains).
- vSphere cluster: where ESXi capacity lives; clusters exist inside domains.
Core concept: separate fleet services from instance management planes
Within the management domain that hosts fleet services, a typical shutdown sequence starts with:
Use this PowerShell example with PowerCLI to validate vCenter and ESXi versions:
Fleet services
If a workload domain’s vCenter or NSX is degraded:
- VCF Operations: inventory, observability, and the console where centralized lifecycle and identity workflows surface.
- VCF Operations fleet management appliance: lifecycle management operations for the fleet management components.
- VCF Automation: self-service consumption, organization constructs, and automation.
- VCF Identity Broker + VCF Single Sign-On: centralized authentication configuration across components (with important exclusions).
Instead, run this mental separation:
Instance management planes
This matters because VCF 9.0 pushes more workflows into a centralized console, but it does not eliminate domain-level responsibilities.
- SDDC Manager
- Management domain vCenter
- Management domain NSX
VCF 9.0 becomes dramatically easier to operate when everyone can point to the same boundaries:
Domain-level control planes
Design-time decisions that are expensive to change later:
- Its own vCenter
- Its own NSX Manager (dedicated per domain, or shared depending on design)
What runs where in VCF 9.0 GA
In multi-instance environments:
- The management domain of the first instance hosts the fleet-level management components (VCF Operations and VCF Automation).
- Additional instances still have their own instance-level management components (SDDC Manager, vCenter, NSX), and may deploy collectors as needed.
Bring-up and initial enablement:
- VCF Operations fleet management is treated as a first-class appliance and should be protected with vSphere HA in the default management cluster.
- VCF Single Sign-On can provide one-login access for many components, but not SDDC Manager and not ESXi.
Who owns what
Key operational detail:
| Component or capability | Platform team (VCF) | VI admin (domains and clusters) | App and platform teams |
|---|---|---|---|
| Fleet bring-up (VCF Installer, fleet creation) | Own | Consult | Inform |
| Fleet-level management components (VCF Operations, fleet management appliance, VCF Automation) | Own | Consult | Inform |
| VCF Identity Broker and VCF Single Sign-On configuration | Own | Consult | Inform |
| SDDC Manager (per instance) | Own (platform governance) | Own day-2 execution | Inform |
| Management domain vCenter and NSX | Shared | Own | Inform |
| Workload domain lifecycle (create domain, add clusters, remediate hosts) | Shared | Own | Inform |
| Workload consumption (Org structure, projects, templates, quotas, policies) | Shared (guardrails) | Consult | Own |
| Backup and restore for fleet management components | Own | Consult | Inform |
| Backup and restore for instance components (SDDC Manager, vCenter, NSX) | Shared (standards) | Own | Inform |
| Day-2 password lifecycle (rotation, remediation) | Own (policy + tooling) | Shared | Inform |
| Certificates and trust (CA integration, renewal cadence) | Own | Shared | Inform |
| DR plans for management components and identity | Own | Consult | Inform |
| DR plans for workload domains and applications | Shared (platform) | Shared (infra) | Own |
This table is meant to stop “that’s not my job” loops during incidents and upgrades.
- Platform team owns the fleet services and guardrails.
- VI admins own domain lifecycle execution and capacity.
- App teams own how they consume resources and what SLAs they require.
Day-0, day-1, day-2 map
VCF Single Sign-On is designed to streamline access across multiple VCF components with one authentication source configured from the VCF Operations console.
Day-0
This is where most “core infrastructure lifecycle” actually executes.
- How many fleets you need (governance and isolation boundary).
- How many instances you need (location and operational boundary).
- Identity design:
- VCF Identity Broker deployment mode (embedded vs appliance).
- SSO scope (single instance vs cross-instance vs fleet-wide).
- Shared vs separate IdPs and SSO boundaries.
- Network and IP plan:
- Subnet sizing for growth matters because changing subnet masks for infrastructure networks is not supported.
- Decide whether fleet-level components share the management VM network or get a dedicated network or NSX-backed segment.
- Management domain sizing:
- Management domains must be sized to host the management components plus future workload domain growth.
- Lifecycle blast radius strategy:
- How you segment domains, instances, and fleets to control upgrade and incident scope.
Day-1
A) Single site with minimal footprint
- Deploy the VCF Installer appliance, download binaries, and start a new VCF fleet deployment.
- Bring up the first instance and its management domain.
- Deploy the fleet-level management components (VCF Operations, fleet management appliance, VCF Automation).
- Deploy VCF Identity Broker (often appliance mode for multi-instance SSO scenarios) and configure VCF Single Sign-On.
- Create initial workload domains, and connect them into VCF Automation as needed.
Day-2
This is the conversation leadership actually needs.
- Lifecycle management:
- Management component lifecycle through VCF Operations fleet management.
- Cluster lifecycle through vSphere lifecycle tooling, with VCF coordinating.
- Identity operations:
- Adding components and instances into SSO scope.
- Re-assigning roles and permissions inside vCenter and NSX after SSO configuration changes.
- Security hygiene:
- Password rotation and remediation flows.
- Certificate replacement with CA-signed certs across both management components and instance components.
- Platform resilience:
- Backup scheduling to an SFTP target for management components and instance components.
- Shutdown and startup runbooks that preserve authentication and cluster integrity.
Identity and SSO boundaries that actually matter
What VCF Single Sign-On does (and does not)
These are the things you deploy once per fleet to provide centralized capabilities:
You should treat SFTP backup targets as day-1 prerequisites, not an afterthought.
- It supports SSO across components like vCenter, NSX, VCF Operations, VCF Automation, and other VCF management components.
- It explicitly excludes SDDC Manager and ESXi, which means you still need local access patterns and break-glass workflows for those systems.
Identity pillars in VCF
Ownership rule of thumb:
- External IdP (SAML/OIDC or directory)
- VCF Identity Broker (brokers authentication and maintains SSO tokens)
- VCF Single Sign-On (centralized authentication configuration and user management)
Use these terms consistently in meetings, designs, and runbooks:
- Each VCF Identity Broker is configured with a single identity provider.
VCF Identity Broker deployment modes
C) Separate fleets for regulated isolation
| Decision point | Embedded (vCenter service) | Appliance (3-node cluster) |
|---|---|---|
| Where it runs | Inside management domain vCenter | Stand-alone appliances deployed via VCF Operations fleet management |
| Multi-instance recommendation | One per instance | Up to five instances per Identity Broker appliance |
| Availability characteristics | Risk of being tied to mgmt vCenter availability | Designed for higher availability; handles node failure |
| Typical fit | Single instance, simpler environments | Multi-instance, larger environments, stronger availability targets |
Quick comparison:
Challenge: You need shared identity for convenience, but regulated isolation for some tenants
Your identity design is built on three pillars:
Your identity design is built on three pillars:
Two other details matter for design reviews:
- Best when you need to start small and accept tighter fault domains.
- Typical posture:
- Single fleet, single instance.
- Management components and workloads can be co-located in one cluster for footprint reduction.
- Operational reality:
- You are trading physical failure-domain isolation for speed and cost.
- Plan early if you intend to adopt organization models in VCF Automation that require additional clusters.
Change management warning: moving from appliance to embedded mode requires resetting the VCF Single Sign-On configuration and re-adding users and groups. Treat the deployment mode decision as day-0.
- Region in VCF terms is multiple sites within synchronous replication latencies.
- Typical posture:
- Single fleet, single instance.
- Stretched clusters across the two sites for higher availability.
- A dedicated workload domain for workloads, with management components protected in the management domain cluster.
- Day-2 consequences:
- You are now dependent on stretched network and storage behaviors for management plane availability.
- You must design first-hop gateway resilience across availability zones for stretched segments.
Your fastest path to org alignment is separating two things people constantly mix up:
- Typical posture:
- Single fleet, multiple instances (at least one per region or per major site).
- Fleet-level management components run in the management domain of the first instance.
- Additional instances bring their own management domain control planes.
- Practical design statement:
- Recovery between regions is a disaster recovery process. Do not confuse “multi-region” with “active-active without DR work”.
Each workload domain is its own lifecycle and isolation boundary, typically with:
| Topology | Fleet count | Instance count | Typical SSO scope | Primary operational risk |
|---|---|---|---|---|
| Single site | 1 | 1 | Single instance or fleet-wide | Small fault domain, tight coupling |
| Two sites, one region | 1 | 1 | Fleet-wide (common) | Stretched dependencies for management availability |
| Multi-region | 1+ | 2+ | Cross-instance or fleet-wide | Governance dependency on where fleet services run |
Failure domain analysis
# Connect to vCenter
Connect-VIServer -Server <vcenter_fqdn>
# vCenter build and version
$about = (Get-View ServiceInstance).Content.About
[PSCustomObject]@{
Product = $about.FullName
Version = $about.Version
Build = $about.Build
}
# ESXi hosts build and version
Get-VMHost | Sort-Object Name | Select-Object Name, Version, Build
Anti-patterns
A clean greenfield deployment is intentionally opinionated:
- You lose or degrade centralized lifecycle workflows, automation workflows, and centralized observability.
- Instance control planes still exist, but day-2 operations may become more manual.
Operational gotcha:
- Users from external identity providers cannot authenticate.
- You must fall back to local accounts for subsequent operations until Identity Broker is restored.
Instance management domain failure
# On the SDDC Manager appliance
sudo lookup_passwords
Fast validation: confirm build levels in your environment
Here’s the practical decision point.
- Workloads in that domain take the blast radius.
- Other workload domains in other instances are unaffected.
Example RTO/RPO targets you can start with
C) Multi-region
- Fleet services (VCF Operations, fleet management, VCF Automation):
- RTO: 4 hours
- RPO: 24 hours (aligned to daily backups)
- Identity Broker:
- RTO: 1 to 2 hours
- RPO: 24 hours (align to backup cadence, plus local break-glass accounts)
- Instance management domain:
- RTO: 2 to 4 hours
- RPO: 24 hours
- Workload domain:
- Driven by application SLAs and data replication strategy
Operational runbook snapshot
Shutdown order matters
B) Two sites in one region
- Shut down instances that do not run VCF Operations and VCF Automation first.
- The instance running the fleet-level management components should be last.
You get clean operations when you stop trying to force everything into a single “management plane” blob.
- VCF Automation
- VCF Operations
- VCF Identity Broker
- Instance management components (NSX, vCenter, SDDC Manager)
- ESXi hosts
Legend:
- Taking the VCF Operations cluster offline can take significant time. Plan your maintenance windows accordingly.
Backups: get the SFTP target right early
A) Shared enterprise IdP with fleet-wide SSO
- Configure SFTP settings for VCF management components.
- Configure backup schedules for VCF Operations and VCF Automation.
- Configure backup schedules for SDDC Manager, NSX Manager, and vCenter at the instance level.
Password lifecycle: know which system is authoritative
- You can change passwords for many local users through VCF Operations.
- Some password expiration and status information is updated on a schedule; real-time status often requires checking at the instance source (SDDC Manager and related APIs).
- You can retrieve default passwords from SDDC Manager using the
lookup_passwordscommand on the appliance.
Important constraint:
Scope and code levels referenced (VCF 9.0 GA core):
If VCF Identity Broker is down:
If VCF Operations, fleet management, or VCF Automation are impaired:
- Treating fleet and instance as synonyms
- Fleet is centralized governance and services.
- Instance is a discrete VI footprint with its own management domain.
- Designing SSO as if SDDC Manager participates
- It does not. Plan break-glass access and operational runbooks accordingly.
- Choosing embedded Identity Broker for multi-instance and then being surprised by availability coupling
- If multi-instance SSO matters, appliance mode is commonly the safer default.
- Using one fleet for regulated tenants without validating identity and governance blast radius
- Separate fleets remain the cleanest isolation boundary when governance separation is required.
- Under-sizing management domains
- Fleet services and management components are not free. You will scale them and patch them like any other production system.
Summary and takeaways
- Use the official construct hierarchy to keep conversations consistent: private cloud -> fleet -> instance -> domains -> clusters.
- Fleet-level management components centralize governance, but they do not collapse instance control planes into a single shared management plane.
- Identity design is a day-0 decision. Choose Identity Broker deployment mode and SSO scope intentionally.
- Align topology to operations:
- Single site is about speed and footprint.
- Two-site in one region is about availability with stretched dependencies.
- Multi-region is about DR posture and multiple instance management planes.
Conclusion
Every instance retains its own control plane boundaries:
- Fleet boundaries for centralized services and governance.
- Instance boundaries for discrete infrastructure footprints.
- Domain boundaries for lifecycle and workload isolation.
VMware Cloud Foundation 9.0 Documentation (VCF 9.0 and later): https://techdocs.broadcom.com/us/en/vmware-cis/vcf/vcf-9-0-and-later/9-0.html
Sources
If an instance management domain is down:
