Structuring Azure for scale: Collaborative monitoring in complex tenants

4 min read
8 Aug, 2025
Structuring Azure for scale: Collaborative monitoring in complex tenants
7:44

Is your Azure environment a sprawling mystery? Do you truly know what's running, or who's watching it? For many large organisations, Azure grows organically, like a wild garden: subscriptions multiply, teams have perhaps too much autonomy, and critical projects often launch long before any real governance takes root.

We recently partnered with an enterprise client facing precisely this chaos: a large Azure tenant with significant blind spots, operations teams suddenly accountable, and a burning question: "how do we bring order without breaking everything?".

Instead of starting from scratch, we teamed up with their operations, engineering, and architecture teams to deliver something smarter: scalable, policy-driven monitoring built on Azure-native tooling, applied with surgical precision, and collaboration at every step.

This is the story of how it came together.

Need: Visibility without disruption

Our engagement began with the operations team. They were newly accountable for supporting the tenant, yet had no baseline monitoring, alerting, or consistent governance to lean on. Their request was straightforward: “Give us visibility and make it sustainable.”

But the tenant was in a state of architectural drift:

  • 30+ subscriptions across multiple business units
  • Inconsistent tagging, RBAC, and policy enforcement
  • Non-standard management group hierarchy
  • Disconnected observability tooling, or none at all

This is not atypical for many mature Azure footprints: tenants that grow organically through quick project spin-ups, dev/test experiments that go production, or siloed business units with free rein. Governance and observability often come after the fact, if at all.

Our client was not oblivious to this. The architecture team, responsible for tenant design, shared the goal of modernisation but had valid concerns. With a major, critical programme of work underway, the risk of disruptive change meant that tenant restructuring could not be done in a rush. Everyone agreed that improvement was needed, but timing and proper scoping mattered.

The cherry on the cake? A strict compliance requirement for full isolation between siloed environments. For these workloads, monitoring data, including logs, should not co-mingle.

Successful delivery then required alignment with the broader technology group. What followed was a model example of collaborative design: senior leaders endorsed the direction, internal stakeholders helped shape it, and OSS Group brought the tools, patterns, and implementation expertise to get it over the line.

Approach: policy-driven observability, built on Microsoft’s frameworks

We couldn’t force a tenant-wide restructure, and we couldn’t interrupt existing projects. Instead, we took a targeted, modular approach focused on repeatability, visibility, and low-friction integration.

What we needed was a way to:

  • Introduce monitoring infrastructure that aligned to best practices
  • Ensure isolation of monitoring data per compliance "realm"
  • Wrap it all in Azure Policy so compliance could be enforced going forward
  • Use Infrastructure-as-Code and CI/CD so everything was repeatable and auditable

This is where Azure Verified Modules (AVMs), Microsoft’s Azure Monitoring Baseline Alerts (AMBA), and a lot of smart Terraform came into play.

Monitoring architecture: centralised control, local visibility

There were five potential models for monitoring and alerting, ranging from fully centralised to entirely decentralised. After discussion across architecture, operations, and engineering, we landed on a hybrid approach:

  • Each silo received its own Log Analytics workspace and supporting infrastructure
  • A shared baseline for alerting was enforced through Azure Policy
  • Workbooks and Grafana dashboards were used to surface visibility where it mattered

Azure Verified Modules (AVMs): standardised, scalable, supported

A key decision, jointly agreed with the client’s Lead Platform Engineer, was to adopt Microsoft’s AVMs. These modular, policy-backed Terraform modules enabled rapid, consistent deployment of:

  • Management group structure, governance policies and RBAC defaults (avm-ptn-alz)
  • Monitoring resources (avm-ptn-alz-management)
  • Alerting resources (avm-ptn-monitoring-amba-alz)

The use of AVMs meant we were building with Microsoft-supported, best-practice patterns from the start. It also made the solution highly repeatable across siloed environments.

Terraform and Azure DevOps: CI/CD from day one

All infrastructure was provisioned using Terraform via Azure DevOps pipelines. Shared state enabled collaboration within the team, and Terraform Workspaces were used to cleanly separate environments.

We relied on for_eachlocals, and variable-driven design to scale configuration without duplicating code, while keeping things human-readable (mostly!).

Enablement matters: beyond deployment

This wasn’t just a “deploy and disappear” engagement. In parallel with infrastructure roll-out, we delivered guidance documentation and architectural rationale; and conducted training with operations and engineering teams.

The outcome was a capability, not just a build. Ops teams now understand how to extend and maintain their observability stack, and architects have a model for safe, modular tenant uplift that avoids large-scale restructure.

Key collaborators: the unsung strength of the engagement

This kind of work is never just about tools. The success of the project came down to the strength of the relationships:

  • The operations manager who brought us in and championed the work
  • The GM of technology who gave high-level endorsement and trust
  • The architecture team who ensured the solution aligned with long-term goals
  • The lead platform engineer, whose pragmatic technical leadership unblocked RBAC issues, set guardrails around tooling (Terraform, Azure DevOps), and helped prioritise which subscriptions to target

It was a case study in shared ownership: OSS Group brought the framework, but internal teams brought the insight that made it land.

The result: a solid foundation, now and later

The immediate outcome was clear:

  • Monitoring and alerting now exists for all critical workloads
  • Visibility is structured, enforceable, and consistent
  • Everything is managed as code, driven by policy, and easy to replicate

But there’s also a strategic win: the organisation now has a tested pattern for rolling out CAF-aligned governance across their Azure footprint, with minimal disruption.

It stabilises the present and enables the future.

Key Takeaways

If your Azure environment has grown organically and you're wondering how to introduce structure without pausing everything, consider this:

  • You don’t need to start from scratch. It’s possible to retrofit ALZ patterns onto existing tenants safely.
  • Microsoft’s tooling (AVMs, AMBA, Azure Policy) can accelerate delivery if used strategically.
  • Collaborative engagement beats top-down mandates. Design alongside your stakeholders, not around them.
  • Don’t forget enablement. The best architecture is one people understand, own, and evolve.

Want to Do Something Similar?

Whether you're looking to stabilise your Azure tenant, roll out observability, or just start aligning to CAF in a practical way, OSS Group can help.

We specialise in:

  • Cloud governance uplift without disruption
  • Landing zone implementation and remediation
  • Monitoring and alerting frameworks tailored to your business
  • Infrastructure-as-Code with enablement built in

Reach out to us at OSS Group or call 0800-OSS-GRP. Let’s structure something together, without stopping the show.

 

Latest Thinking Banner - Taranaki