Structuring Azure for Observability & Monitoring within Complex Tenants

4 min read

8 Aug, 2025

7:44

Is your Azure environment a sprawling mystery? Do you truly know what's running, or what your observability is missing? For many large organisations, Azure grows organically, like a wild garden: subscriptions multiply, teams have perhaps too much autonomy, and critical projects often launch long before any real monitoring or governance takes root.

We recently partnered with an enterprise client facing precisely this chaos: a large Azure tenant with significant blind spots, operations teams suddenly accountable, and a burning question: "how do we bring order without breaking everything?".

Instead of starting from scratch, we teamed up with their operations, engineering, and architecture teams to deliver something smarter: scalable, policy-driven monitoring built on Azure-native tooling, applied with surgical precision, and collaboration at every step.

This is the story of how it came together.

Need: Visibility without disruption

Our engagement began with the operations team. They were newly accountable for supporting the tenant, yet had no baseline monitoring, alerting, or consistent governance to lean on. Their request was straightforward: “Give us visibility and make it sustainable.”

But the tenant was in a state of architectural drift:

30+ subscriptions across multiple business units
Inconsistent tagging, RBAC, and policy enforcement
Non-standard management group hierarchy
Disconnected observability tooling, or none at all

This is not atypical for many mature Azure footprints: tenants that grow organically through quick project spin-ups, dev/test experiments that go production, or siloed business units with free rein. Governance and observability often come after the fact, if at all.

Our client was not oblivious to this. The architecture team, responsible for tenant design, shared the goal of modernisation but had valid concerns. With a major, critical programme of work underway, the risk of disruptive change meant that tenant restructuring could not be done in a rush. Everyone agreed that improvement was needed, but timing and proper scoping mattered.

The cherry on the cake? A strict compliance requirement for full isolation between siloed environments. For these workloads, monitoring data, including logs, should not co-mingle.

Successful delivery then required alignment with the broader technology group. What followed was a model example of collaborative design: senior leaders endorsed the direction, internal stakeholders helped shape it, and OSS Group brought the tools, patterns, and implementation expertise to get it over the line.

Approach: policy-driven observability, built on Microsoft’s frameworks

We couldn’t force a tenant-wide restructure, and we couldn’t interrupt existing projects. This is the kind of challenge OSS Group's observability practice thrives on. We took a targeted, modular approach focused on repeatability, visibility, and low-friction integration.

What we needed was a way to:

Introduce monitoring infrastructure that aligned to best practices
Ensure isolation of monitoring data per compliance "realm"
Wrap it all in Azure Policy so compliance could be enforced going forward
Use Infrastructure-as-Code and CI/CD so everything was repeatable and auditable

This is where Azure Verified Modules (AVMs), Microsoft’s Azure Monitoring Baseline Alerts (AMBA), and a lot of smart Terraform came into play.

Monitoring architecture: centralised control, local visibility

There were five potential models for monitoring and alerting, ranging from fully centralised to entirely decentralised. After discussion across architecture, operations, and engineering, we landed on a hybrid approach:

Each silo received its own Log Analytics workspace and supporting infrastructure
A shared baseline for alerting was enforced through Azure Policy
Workbooks and Grafana dashboards were used to surface visibility where it mattered

Azure Verified Modules (AVMs): standardised, scalable, supported

A key decision, jointly agreed with the client’s Lead Platform Engineer, was to adopt Microsoft’s AVMs. These modular, policy-backed Terraform modules enabled rapid, consistent deployment of:

Management group structure, governance policies and RBAC defaults (avm-ptn-alz)
Monitoring resources (avm-ptn-alz-management)
Alerting resources (avm-ptn-monitoring-amba-alz)

The use of AVMs meant we were building with Microsoft-supported, best-practice patterns from the start. It also made the solution highly repeatable across siloed environments.

Terraform and Azure DevOps: CI/CD from day one

All infrastructure was provisioned using Terraform via Azure DevOps pipelines. Shared state enabled collaboration within the team, and Terraform Workspaces were used to cleanly separate environments.

We relied on for_each, locals, and variable-driven design to scale configuration without duplicating code, while keeping things human-readable (mostly!).

Enablement matters: beyond deployment

This wasn’t just a “deploy and disappear” engagement. In parallel with infrastructure roll-out, we delivered guidance documentation and architectural rationale; and conducted training with operations and engineering teams.

The outcome was a capability, not just a build. Ops teams now understand how to extend and maintain their observability stack, and architects have a model for safe, modular tenant uplift that avoids large-scale restructure.

Key collaborators: the unsung strength of the engagement

This kind of work is never just about tools. The success of the project came down to the strength of the relationships:

The operations manager who brought us in and championed the work
The GM of technology who gave high-level endorsement and trust
The architecture team who ensured the solution aligned with long-term goals
The lead platform engineer, whose pragmatic technical leadership unblocked RBAC issues, set guardrails around tooling (Terraform, Azure DevOps), and helped prioritise which subscriptions to target

It was a case study in shared ownership: OSS Group brought the framework, but internal teams brought the insight that made it land.

The result: a solid foundation, now and later

The immediate outcome was clear:

Monitoring and alerting now exists for all critical workloads
Visibility is structured, enforceable, and consistent
Everything is managed as code, driven by policy, and easy to replicate

But there’s also a strategic win: the organisation now has a tested pattern for rolling out CAF-aligned governance across their Azure footprint, with minimal disruption.

It stabilises the present and enables the future.

Key Takeaways

If your Azure environment has grown organically and you're wondering how to introduce structure without pausing everything, consider this:

You don’t need to start from scratch. It’s possible to retrofit ALZ patterns onto existing tenants safely.
Microsoft’s tooling (AVMs, AMBA, Azure Policy) can accelerate delivery if used strategically.
Collaborative engagement beats top-down mandates. Design alongside your stakeholders, not around them.
Don’t forget enablement. The best architecture is one people understand, own, and evolve.