Table of Contents

Chapter 5: Identity and Access Management Architecture and Engineering

Learning Outcomes:

Introduction

If Chapter 4 was about where controls live, this chapter is about who the controls are protecting things from — and, just as importantly, who they are letting in. Identity has quietly become the most important security boundary in the modern enterprise. Firewalls still matter. Network segmentation still matters. But when a contractor logs in from a coffee shop, when a microservice calls a payment API, or when an administrator pushes a Terraform change to production, the question that decides whether the action is safe is not "where did this packet come from?" It is "who is this subject, and are they allowed to do this right now?"

That is why Identity and Access Management (IAM) has moved from a back-office HR-adjacent function to one of the most consequential disciplines in security architecture. A well-designed IAM program decides who exists, what they may do, how strongly they must prove themselves, when their privilege should be reviewed or revoked, and what evidence the organization keeps when something goes wrong. A poorly designed program produces dormant admin accounts that linger for years after an employee resigns, shared credentials buried in shell scripts, federation trusts that nobody fully understands, and audit logs that cannot answer the basic question "who did this?"

Identity must be engineered end to end. That means provisioning and deprovisioning workflows, access control models such as RBAC, ABAC, MAC, and DAC, federation and single sign-on protocols including SAML, OAuth 2.0, and OpenID Connect, conditional access policies based on context, biometric and multi-factor authentication, privileged access management, secrets handling, the identity-centric attack paths that abuse these systems, and the everyday troubleshooting that occurs when Kerberos tickets expire, OIDC redirect URIs are misspelled, or 802.1X supplicants refuse to talk to the RADIUS server.

The thread tying all of this together is the same thread that ran through Chapter 4: trust must be narrow, conditional, and continuously evaluated. The IAM platform is where Zero Trust either becomes real or becomes a marketing slide. In this chapter we walk through the identity lifecycle, compare access control models, explore how to lock down high-privilege accounts, examine how attackers actually abuse identity systems, and finish by troubleshooting the authentication failures that fill the average security analyst's ticket queue.


How Do We Manage the Identity Lifecycle?

Every identity in your environment has a beginning, a middle, and an end. The middle is usually long, quiet, and uneventful. The beginning and the end are where most security incidents quietly originate. An account that was provisioned with too much privilege on day one will keep that privilege for years. An account that was not deprovisioned on someone's last day may quietly be used months later by an attacker who bought stolen credentials on a forum.

The identity lifecycle describes the full journey of an identity from creation to retirement: request, approval, credential issuance, role assignment, periodic review, modification, suspension, and eventual deletion. Architects design this lifecycle so that each stage has clear ownership, automated triggers where possible, and an audit trail that someone can actually read.

Credential Issuance and Self-Provisioning

The first practical question is how does someone get a credential in the first place? In a small organization, this might be the IT lead handing out a temporary password over the phone. In an enterprise, it is a workflow that begins in HR, flows into an identity governance platform, and ends with an automated account creation in Active Directory, an identity provider, several SaaS applications, and possibly a hardware token.

Modern environments increasingly use self-provisioning patterns. A new hire receives an enrollment link, registers their device, sets up MFA, and proves identity through a one-time code sent to the manager or HR system. Identity proofing — verifying that the human behind the credential is actually the intended person — is sometimes layered in through document verification, video calls, or knowledge-based questions. These steps matter because every weak point in initial enrollment becomes a weak point that lingers for the entire lifecycle.

Key Point An account is never more vulnerable to social engineering than during enrollment, password reset, or MFA re-registration. Attackers know this. Helpdesk procedures should treat these moments as high-risk transactions, not routine requests.

Provisioning and Deprovisioning

Provisioning is the process of granting a subject the access they need to do their job. Deprovisioning is the process of taking that access away when they no longer need it — whether because they changed roles, finished a project, left the company, or had their account compromised.

The mature pattern is automated provisioning driven by an authoritative source, usually an HR system or contractor management platform. When HR marks someone as hired, the identity governance system creates accounts in the directory and downstream applications. When HR marks someone as terminated, the same system disables accounts, revokes tokens, and removes group memberships — ideally in minutes, not days.

The opposite, and unfortunately common, pattern is manual provisioning by ticket. A new hire's manager files individual requests for each application. A leaving employee's access is removed only when somebody remembers. The result is privilege creep: subjects accumulate access over years that nobody can justify and nobody can fully see.

A well-designed lifecycle includes:

Example Cinderella joins Glass Slipper Tech as a backend systems administrator. Her joiner workflow grants the baseline "Engineering" role: email, Slack, GitHub, the staging AWS account, and a development Jira project. Six months later she transfers to the Payments team. The mover workflow adds her to the "Payments-Engineer" role (PCI-scoped staging, payments code repo, the on-call rotation) — and removes her access to the original team's repository and Jira project. Two years later she resigns. The leaver workflow, triggered by HR marking her terminated, disables her IdP account within minutes; SCIM propagates the disable to every federated SaaS app; her PAM vault entries are revoked; and after a 30-day legal-hold retention window, the account is deleted. At no point did anyone file a ticket asking for the old access to be removed — the lifecycle did it automatically.

Subject Access Control: Users, Processes, Devices, and Services

It is tempting to think of "identity" as something that belongs only to humans. In a modern enterprise, that mental model is dangerous. Identities also belong to:

Each of these subject types needs an identity, a credential of some kind, and a defined relationship with the resources it accesses. Service identities in particular are an enormous attack surface because they often hold long-lived credentials, run with broad privilege, and are rarely reviewed. A compromised service account can do far more damage than a compromised user account because it operates silently and almost never triggers user-behavior alerts.

Warning If your organization rotates user passwords every 90 days but cannot tell you when the service account behind the nightly database export last had its password changed, your IAM program has a blind spot the size of the data warehouse.

Federation, Identity Providers, Service Providers, and SSO

In a small environment, every application can hold its own user database. In an enterprise, that approach is unworkable. Hundreds of applications, each with its own credentials, would mean hundreds of password resets, hundreds of inconsistent MFA configurations, and hundreds of places to forget to deprovision.

Federation solves this by separating who you are from what you can use. An Identity Provider (IdP) — such as Microsoft Entra ID, Okta, Google Workspace, or Ping Identity — owns the authentication event. A Service Provider (SP) — the application the user actually wants to use — trusts the IdP's signed assertion that the user is who they say they are. The SP never sees the user's password.

Single Sign-On (SSO) is the user-facing experience that federation enables: log in once at the IdP, and subsequent applications recognize the session without prompting again. The two dominant federation protocol families are:

The architectural payoff of federation is enormous. Deprovisioning becomes a single act at the IdP rather than a scavenger hunt across dozens of apps. MFA policy is enforced once and applies everywhere. Conditional access — discussed later in this chapter — becomes possible because every application sees the same enriched identity context.

Federated Identity and SSO Workflow (SAML/OIDC) Figure 5.1: A federated SSO exchange. The user authenticates once at the Identity Provider, which issues a signed assertion or token that the Service Provider validates without ever seeing the user's password.

Case Study Emma Woodhouse and the Offboarding Crisis at Highbury Networks

Emma Woodhouse had spent her first six months as the new IAM Specialist at Highbury Networks tidying up what her predecessor had left behind. Highbury was a fast-growing analytics firm with about 1,400 employees, dozens of contractors, and roughly 180 SaaS applications acquired through a decade of departmental purchases. Provisioning was reasonably automated for Microsoft 365 and the core CRM, but the rest of the SaaS portfolio was a patchwork of manual requests and shared admin consoles.

The crisis began on a Friday afternoon. A senior data analyst named Moriarty had been terminated for cause that morning. HR processed the offboarding ticket immediately, and Emma's automation disabled Moriarty's account in Microsoft Entra ID within fifteen minutes. By all appearances, the offboarding was clean.

Then, at 7:42 PM, Highbury's SOC paged Emma. Moriarty's email account had been disabled, but two other systems were still showing active sessions associated with him: a marketing analytics SaaS that had been onboarded by the marketing team three years earlier without going through IT, and a third-party data vendor's portal where Moriarty had been a designated power user. Neither was federated to Entra ID. Both used local credentials. Both had been used to download customer data after Moriarty's termination.

The incident did not result in a major breach — Highbury's DLP caught the second download attempt and the legal team recovered the data — but it was a wake-up call. Emma spent the following quarter executing what she called the "federate or retire" project. Every SaaS application in the company had to either be brought under Entra ID's SSO umbrella, with deprovisioning tied to the HR feed, or be formally retired. Where an application could not support SAML or OIDC, Emma worked with vendors to enable SCIM-based provisioning so that account disable would happen automatically.

Eighteen months later, Highbury could disable a departing employee's access across the entire SaaS portfolio in under five minutes. The incident with Moriarty was the last time the company learned about an unfederated application from its SOC instead of from its inventory.


Which Access Control Model Is Right for the Job?

Authentication answers the question who are you? Authorization answers the question what are you allowed to do? Access control models are the formal frameworks an architect uses to answer the second question consistently across thousands of users and millions of resources.

There is no single best model. Each one trades off rigidity against flexibility, simplicity against expressiveness, and administrative effort against precision. Mature environments combine several of them.

RBAC, ABAC, MAC, and DAC

Role-Based Access Control (RBAC) assigns privileges to roles rather than individual users. Users inherit privileges by being assigned to one or more roles. RBAC is the workhorse of enterprise IAM because it scales reasonably well: when a new accountant joins, you assign them the "Accountant" role and they instantly receive the right set of permissions. The weakness of RBAC is role explosion — over time, organizations end up with thousands of overlapping roles ("Accountant — Western Region — Senior — Read-Only") because business reality is more nuanced than any role hierarchy can capture.

Attribute-Based Access Control (ABAC) evaluates access decisions based on attributes of the subject, the object, the action, and the environment. A policy might say: "A user with department=Finance and clearance=Confidential may read documents tagged sensitivity=Confidential during business hours from a managed device." ABAC is far more flexible than RBAC and can express conditional logic, but it requires high-quality attribute data and careful policy authoring. A wrong attribute on a single document can grant or deny access in surprising ways.

Mandatory Access Control (MAC) enforces access decisions based on system-assigned classifications and labels that ordinary users cannot override. Government and military systems are the classic examples: a document classified Top Secret may only be read by subjects cleared to Top Secret, no matter who created the document. MAC is rigid by design — that rigidity is its security value.

Discretionary Access Control (DAC) allows the owner of a resource to decide who else may access it. Unix file permissions and Windows NTFS access control lists are familiar examples. DAC is flexible and intuitive but produces inconsistent enforcement and is easy for users to misconfigure (everyone has shared a folder "with the whole company" at some point).

Example Consider how each model handles the same request at Geneva Labs: "Allow Dr. Moriartyenstein to read patient record #4471."

  • DAC: Whoever created the record (perhaps the admitting nurse) added Dr. Moriartyenstein to its ACL. If the nurse forgot, he cannot read it — even though clinically he should.
  • RBAC: Dr. Moriartyenstein has the role "Attending Physician — Cardiology." Anyone with that role can read any cardiology patient record. Simple, but the nephrology consult he also needs is invisible to him.
  • ABAC: A policy says: "A physician may read a patient record if (treating_team contains physician.id) OR (consult_request exists for physician.id on patient.id)." The decision is made fresh on each request from current attribute data.
  • MAC: The record carries a system-assigned label "PHI — Cardiology Ward." Only subjects whose clearance includes that label can read it, and no doctor — not even the chief of medicine — can override the label themselves.
Model Decision Driver Strengths Weaknesses Common Use
RBAC Role assigned to subject Scales well, easy to audit Role explosion, coarse granularity Enterprise applications, ERP, HR systems
ABAC Attributes of subject, object, action, context Highly expressive, supports Zero Trust Requires clean attribute data, harder to debug Cloud IAM, conditional access, data tagging
MAC System-enforced classifications Strong, consistent enforcement Inflexible, administrative overhead Government, military, regulated environments
DAC Resource owner's discretion Simple, intuitive Inconsistent, easy to misconfigure File systems, shared drives, collaboration tools
Table 5.1: Comparing the Four Major Access Control Models.

Policy Decision and Enforcement Points

Modern IAM architectures separate the decision about whether to grant access from the enforcement of that decision. Two terms describe this split:

The reason this matters architecturally is consistency. When dozens of applications each implement their own access logic, policy drifts. When they all consult a centralized PDP, policy changes apply everywhere at once. This is the foundation of Zero Trust enforcement: a single policy engine evaluating context-rich requests, and many enforcement points in front of resources.

Conditional Access and Context-Based Controls

Conditional access is ABAC dressed for the cloud. Modern identity providers let architects write policies such as:

Example A finance analyst tries to open the corporate ERP from a hotel Wi-Fi network on a Saturday evening. The conditional access engine evaluates the request: the user is in a permitted role (✓), but the device is unmanaged (✗), the location is outside the usual country (✗), and the time falls outside business hours (✗). Rather than allow or deny outright, the policy issues an obligation: grant access only after the user completes a step-up FIDO2 challenge AND restrict the session to read-only for two hours. The same user, on the same laptop, on Monday morning at the office, would have been signed in silently with no extra prompt. Same identity, same resource, completely different access decision — driven by context.

Conditional access policies typically combine several signals: user-to-device binding (this credential is tied to this enrolled device), geographic location (where the request originated), time-based controls (when access is permitted), and device trust (is the endpoint compliant with patching, encryption, EDR posture). These are the same signals we discussed in Chapter 4's Zero Trust section — IAM is simply where most of them are evaluated.

Thought Question A contractor needs access to one application, from one specific laptop, only during business hours, only from inside the country where they were hired. How would you express this requirement in RBAC alone? How does ABAC or conditional access make it easier — and what new failure modes does it introduce?

Attestation, Biometrics, and Identity Proofing

Some access decisions require stronger evidence than a username and password. Attestation is evidence that a subject — or the device the subject is using — is in an expected and trustworthy state. A laptop may attest that it booted with Secure Boot enabled, has a healthy TPM, and is running the corporate EDR agent. A workload may attest that it was signed by an approved CI/CD pipeline. Attestation moves the access decision from "I trust your password" to "I trust the cryptographic evidence about your environment."

Biometric authentication uses physical or behavioral characteristics — fingerprints, face geometry, iris patterns, voice, typing rhythm — to verify identity. Biometrics are convenient and resistant to phishing, but they have unique architectural concerns:

Biometric Authentication Process Flow Figure 5.2: The biometric authentication pipeline — capture, feature extraction, template comparison, threshold scoring, and the final accept/reject decision.

Identity proofing is the process of confirming, at enrollment, that the human behind a credential is who they claim to be. NIST's digital identity guidelines define increasing levels of proofing — from self-asserted, through remote document verification, up to in-person verification with government-issued ID. Architects choose the proofing level based on the sensitivity of what the identity will eventually be allowed to do.

Physical and Logical Access Control

A complete IAM program covers both physical access control systems — badges, turnstiles, mantraps, biometric door readers — and logical access control systems — directories, identity providers, application authorization. Increasingly, the two are converging. The same identity that opens the data center door may also unlock the jump host that manages the servers inside it. When physical and logical IAM are unified, one revocation event removes both the badge and the network access. When they are separate, the badge office and the helpdesk often disagree about who still works at the company.


How Do We Secure High-Privilege Accounts?

Most accounts in an enterprise can do limited damage if compromised. A handful — domain administrators, cloud root accounts, database administrators, certificate authority operators, CI/CD pipeline service accounts — can cause catastrophic harm. These are the accounts attackers actually want, and they are the accounts that justify disproportionate investment in protection.

Privileged Access Management (PAM)

A Privileged Access Management (PAM) platform is the architectural answer to "how do we let humans use admin privilege without giving them standing admin privilege?" A well-designed PAM solution provides:

A common pattern in mature organizations is the tiered administration model: Tier 0 contains identity infrastructure (domain controllers, IdP, PAM, certificate authorities); Tier 1 contains business-critical servers; Tier 2 contains user workstations. Credentials and sessions never cross tiers downward. A Tier 0 administrator never logs into a Tier 2 workstation with their Tier 0 credentials, because doing so exposes those credentials to anything that may be running on the workstation.

Secrets Management and Credential Rotation

Beyond human privilege, machines need credentials too. API keys, database passwords, TLS certificates, signing keys, OAuth client secrets, and cloud access keys all live somewhere. Where they live determines how safe they are.

Secrets management platforms — HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, Google Secret Manager, CyberArk Conjur — provide:

Example A payment-processing microservice needs to talk to a PostgreSQL database. The insecure pattern: a DATABASE_PASSWORD=hunter2 line in a .env file checked into git. The mature pattern: at startup, the service authenticates to HashiCorp Vault using its Kubernetes service-account token. Vault verifies the pod's identity with the cluster, confirms a policy that allows this workload to read secret/payments/db, and returns a dynamically generated database credential that is valid for 60 minutes. After 60 minutes the credential expires automatically; if the pod is still running it requests a fresh one. If an attacker dumps memory from one pod, they get a credential that dies within the hour — and Vault's audit log records exactly which pod, on which node, fetched which secret, when.

Rotation matters because the longer a secret lives, the more places it has been copied to and the more hands have seen it. Best practice is to rotate routinely and to rotate immediately after any suspected exposure. This applies equally to passwords, tokens, certificates, and keys. Crucially, the rotation process itself must be automated — a manual "rotate every 90 days" policy that depends on a human running a checklist will fail under operational pressure.

Warning Hardcoded secrets in source code repositories are one of the most common root causes of cloud breaches. Once a secret has been committed to git, rotating it is the only safe response — even if you delete the file in the next commit, the history retains it forever. Secrets scanning in CI/CD pipelines should be standard.

Logging, Auditing, and Cloud IAM Trust Policies

Privileged access is meaningful only if it leaves an audit trail. Every elevation, every secret retrieval, every administrative action should land in centralized logging — and that logging must be on a path the privileged user themselves cannot tamper with. A common architectural mistake is to allow domain administrators write access to the SIEM that records their actions.

Cloud environments add another dimension: trust policies that define which identities — including identities in other accounts, other clouds, or other organizations — may assume privileged roles. A misconfigured trust policy that says "any AWS account may assume this role" is functionally a public administrator account. Architects should review trust policies the way they review firewall rules: with a presumption of skepticism and a written justification for every entry.

Case Study SolarWinds and the Golden SAML Attack

The 2020 SolarWinds compromise is one of the most-studied supply chain attacks in modern history, but for this chapter the most relevant detail is what attackers did after gaining a foothold. Once inside affected environments, the threat actor — later attributed to a Russian state-sponsored group — pivoted toward the identity layer.

The technique that became known as Golden SAML worked like this. In a federated environment, the on-premises Active Directory Federation Services (AD FS) server signs SAML assertions using a private token-signing certificate. If an attacker can read that signing certificate from the AD FS server's memory or configuration store, they can mint their own SAML assertions for any user, with any claims they like, including assertions that bypass MFA because the SAML response asserts that MFA was already performed. Service providers — including major cloud platforms — would accept those forged assertions as legitimate, because the cryptographic signature verified.

Attackers used this technique to access cloud email and document repositories belonging to government agencies and large enterprises. The breach was not detected by the cloud providers, because from their perspective every login looked normal. It was detected, eventually, by behavioral anomalies and threat intelligence sharing.

The architectural lessons are sharp. Federation servers and their signing certificates are Tier 0 assets and must be protected as fiercely as domain controllers. Token-signing keys should live in HSMs where the private material cannot be extracted. Cloud IAM trust policies should be reviewed for unexpected SAML issuers. And identity logs from both the IdP and the SP should be centralized so that a forged assertion at the IdP can be cross-checked against unusual access at the SP. Federation is enormously powerful — but it concentrates trust in a small number of cryptographic secrets, and those secrets must be defended accordingly.


How Do Identity Attacks Actually Unfold?

IAM is not only a configuration problem. It is also an attack surface. When threat actors compromise an enterprise, they often do not begin and end with a broken login page. They abuse identity stores, federation relationships, tokens, enrollment workflows, administrative paths, and endpoint trust signals. That is why identity troubleshooting and threat-actor TTPs overlap so heavily in real environments.

The key architectural point is that many "identity incidents" are not pure authentication failures. They are attack paths that use identity as the mechanism of persistence, escalation, or lateral movement. If an analyst sees repeated lockouts, impossible-travel alerts, unexpected MFA registrations, or administrative actions from a legitimate service account, the question is not only "which configuration is wrong?" It may also be "which attack pattern is being expressed through the identity system?"

Privilege Escalation, Credential Dumping, and Token Abuse

The most obvious IAM-relevant TTP is privilege escalation. Attackers who land on a low-privilege workstation often move quickly to steal hashes, Kerberos tickets, browser tokens, cached credentials, or local administrator secrets. The goal is not simply to authenticate once. The goal is to become a more powerful subject in the environment.

Credential dumping is therefore an identity problem as much as an endpoint problem. LSASS memory scraping, browser-session theft, token export from cloud command-line tooling, theft of password vault contents, and replay of cached credentials all convert one foothold into wider access. In federated environments, the equivalent may be theft of refresh tokens, signing keys, or service-account secrets. In cloud environments, it may be abuse of instance metadata services or workload identities to obtain access tokens that were never meant for a human operator.

The defender's job is not to memorize every offensive tool. It is to recognize the architectural weak spots that make the abuse possible: excessive standing privilege, reusable secrets, weak session binding, insufficient token revocation, poor service-account hygiene, and missing audit trails around privileged credential use.

Key Point Identity compromise often looks like a sequence rather than a single event: credential access, privilege escalation, token abuse, then persistence. The earlier the architecture forces reauthentication, attestation, or vault-mediated access, the harder it is for that sequence to complete.

Unauthorized Execution, Lateral Movement, and Defensive Evasion Against Identity Infrastructure

Many IAM incidents are really unauthorized execution problems expressed through legitimate administration paths. A threat actor who obtains a valid admin token may use PowerShell remoting, remote management APIs, scheduled tasks, CI/CD runners, or cloud automation roles to execute actions that appear operationally normal. This is why defenders must understand not just whether an action succeeded, but whether the subject, path, and context make sense.

Lateral movement is frequently identity-enabled. A captured Kerberos ticket enables access to another service. A reused local administrator password enables pass-the-hash. An over-permissioned SaaS integration lets an attacker pivot from collaboration tooling into document repositories or identity administration consoles. In hybrid environments, lateral movement may cross trust domains: from on-prem Active Directory to cloud federation, from a VPN-connected endpoint into privileged infrastructure, or from a compromised automation secret into multiple cloud subscriptions.

Defensive evasion in identity systems usually looks boring, which is why it works. Attackers disable logging on a connector, register their own MFA device, add a second OAuth consent, create a backdoor federation trust, or deliberately operate through existing admin tools so the activity blends into normal change traffic. These quiet forms of abuse are exactly what make identity-centric incidents difficult to diagnose.

Injections and Trust Boundary Manipulation in Identity Workflows

Injections in IAM contexts usually mean an attacker is manipulating a trust boundary rather than "just" attacking an application form field. Examples include LDAP injection against poorly constructed directory queries, SAML or OIDC parameter tampering, SCIM provisioning abuse, malicious claims insertion through a compromised identity broker, or redirect and consent manipulation that causes a legitimate identity flow to produce an illegitimate outcome.

These failures are dangerous because they exploit systems defenders are conditioned to trust. A malformed input into a helpdesk password-reset workflow may not look like malware. A poisoned claim in a federated assertion may still be cryptographically valid if it was generated by a compromised issuer. A provisioning connector that writes the wrong group membership may look like an ordinary synchronization event until the downstream privilege becomes visible.

IAM-Centric TTP Typical Objective Common Symptom High-Value Defensive Signal
Credential Dumping Steal reusable credentials or tokens Impossible travel, sudden token reuse, access from a nonstandard host LSASS access alerts, refresh-token anomalies, vault checkout logs
Privilege Escalation Move from ordinary to administrative identity Unexpected group membership or role assignment Privileged-role change logs, JIT approval bypass, PAM session anomalies
Unauthorized Execution Run actions through valid admin paths New automation job, remote task, or script from an unusual subject PowerShell or runner telemetry tied to identity context
Lateral Movement Reuse identity material to pivot between systems Same account touching unrelated tiers or trust zones Cross-tier authentication events, new service ticket paths, anomalous east-west access
Defensive Evasion Reduce visibility or blend into trusted activity Logging disabled, MFA device changed, new trusted app registration Audit setting changes, MFA reset events, connector configuration drift
Injection / Trust Manipulation Alter an identity workflow or query outcome Odd provisioning result, claim mismatch, or redirect abuse Directory-query anomalies, assertion validation failures, SCIM or IdP admin changes
Table 5.3: Common IAM-centric TTPs, their usual objectives, and the signals defenders should prioritize.

The important distinction is that identity gives these TTPs their meaning. The same activity can later be revisited from different operational angles. Endpoint and infrastructure chapters can treat it as host abuse. Threat-hunting chapters can treat it as a detection pattern. Incident-response chapters can treat it as evidence to reconstruct. But the trust relationships, credential paths, and authorization failures that make the activity dangerous are fundamentally identity problems.

Case Study Hamlet Spots an Identity Attack at Denmark Cyber Defense

Hamlet, a senior threat hunter at Denmark Cyber Defense, was reviewing the previous day's identity telemetry when one service account caught his attention. The account belonged to a synchronization connector that normally authenticated from one management host, requested one narrow set of directory permissions, and generated almost no interactive activity. Overnight, the same account had authenticated from a different server, requested new OAuth consent, and been followed minutes later by a burst of privileged group lookups and remote administrative activity.

None of the individual events looked spectacular. The login succeeded. The consent grant appeared valid. The remote commands were executed through approved tooling. But taken together, the sequence formed a recognizable pattern: compromised service credentials, privilege expansion, unauthorized execution through legitimate admin paths, and defensive evasion through trusted infrastructure.

Hamlet pivoted through the logs and found the missing detail. An attacker had abused a provisioning connector with broader rights than it needed, then used that foothold to request additional tokens and enumerate privileged roles. Because the activity traveled through expected systems, the first-line alerts treated it as routine administration. What exposed it was the TTP sequence, not any single spectacular indicator.

The response focused on identity first. The connector secret was rotated, its permissions were reduced, all recently issued tokens were revoked, and the team added detections for management-plane logins from nonstandard hosts, abnormal consent grants, and privileged directory reads by service principals. The lesson was not merely that attackers steal credentials. It was that identity telemetry becomes far more useful when defenders look for sequences of behavior instead of isolated misconfigurations.


How Do We Troubleshoot Authentication Failures?

A surprising amount of an IAM engineer's life is spent diagnosing authentication failures. A user cannot log in. An automation account suddenly stops working. A new SaaS integration loops endlessly between the IdP and the SP. A wireless client refuses to associate. The protocols involved are interoperable in theory and finicky in practice, and effective troubleshooting depends on understanding what each protocol actually does.

SAML, OAuth, and OpenID Connect

These three protocols dominate modern federation. They are related but not interchangeable.

Protocol Type Token Format Typical Use Case Common Failure Modes
SAML 2.0 Authentication + authorization assertion Signed XML assertion Enterprise SaaS SSO, legacy apps Clock skew, certificate expiry, mismatched entity IDs, broken signature validation
OAuth 2.0 Delegated authorization Opaque or JWT access token API access, "Sign in with..." delegation Wrong scopes, expired tokens, redirect URI mismatch, refresh token revoked
OpenID Connect (OIDC) Authentication on top of OAuth 2.0 JWT ID token + access token Modern web/mobile SSO Misconfigured discovery endpoint, JWKS rotation, audience claim mismatch
Table 5.2: SAML vs. OAuth vs. OpenID Connect — what each protocol is for and how each one tends to break.

SAML troubleshooting almost always comes down to one of four issues: (1) the IdP and SP disagree on the entity ID or the assertion consumer URL, (2) the SAML response signature does not validate because the SP has the wrong IdP certificate or the certificate has expired, (3) clock skew between IdP and SP causes the assertion to be considered "not yet valid" or "expired," or (4) the user's attributes in the assertion do not match what the SP expects (e.g., the SP keys on email but the IdP sends UPN). A SAML tracer browser extension that captures the actual assertion is usually the fastest diagnostic tool.

Example A new SaaS app rolls out and every user sees "SAML response invalid" on first login. The IAM engineer captures the assertion with a SAML tracer and finds two things: the assertion's NotBefore is 14:02:11Z and the SP's clock reads 13:59:48Z — a 2-minute and 23-second skew that puts the assertion in the SP's "future." Worse, the assertion's <Audience> is https://app.vendor.com/saml but the SP was registered with the IdP as https://app.vendor.com/saml/. Two fixes — sync the SP host to NTP, and remove the trailing slash from the IdP's audience configuration — and the entire user base can log in. Both errors produced the same generic "invalid response" message; only the captured assertion told the real story.

OAuth and OIDC troubleshooting has its own characteristic problems. Redirect URI mismatches — even a trailing slash — will block the flow entirely. Scopes that the client requests but the authorization server has not granted will produce subtle authorization failures rather than authentication errors. The JWKS endpoint that the SP uses to fetch the IdP's public signing keys must be reachable and the keys must rotate gracefully; a stale JWKS cache after a key rotation will cause every token to fail validation simultaneously.

Kerberos, EAP, 802.1X, and SAE

Older but still ubiquitous protocols handle authentication on internal networks and Wi-Fi.

Kerberos authenticates users and services in Windows domains and Unix realms via ticket-granting tickets and service tickets. It depends critically on time synchronization (typically a five-minute skew tolerance), correct DNS resolution, and properly registered Service Principal Names (SPNs). Common failures: clock drift after a virtual machine resumes from snapshot; missing or duplicate SPNs after a service account is reused; trust relationships that work in one direction but not the other.

Extensible Authentication Protocol (EAP) is a framework for transporting authentication exchanges, most commonly over 802.1X for wired and wireless network access. The supplicant on the endpoint, the authenticator on the switch or AP, and the authentication server (usually RADIUS) must all agree on the EAP method (EAP-TLS, PEAP, EAP-TTLS) and the certificates involved. EAP-TLS failures are typically certificate problems: the client's cert is expired, was issued by an untrusted CA, or its CRL/OCSP check fails.

Simultaneous Authentication of Equals (SAE) is the handshake used in WPA3 personal networks, replacing the older WPA2 four-way handshake. SAE resists offline dictionary attacks and provides forward secrecy. Troubleshooting SAE issues usually involves client compatibility — older devices may not support WPA3 at all, and some "WPA3 transition mode" configurations have shipped with bugs that cause intermittent association failures.

Biometric and MFA Anomalies

When MFA fails, the temptation is to disable it for the user "just for now." This is exactly what attackers exploit through MFA fatigue attacks, where they trigger repeated push notifications until the user finally taps Approve to make them stop. The mature response is not to disable MFA but to use number matching, push approval with location and application context, or to require step-up via a phishing-resistant factor such as a FIDO2 security key.

Biometric anomalies are usually environmental: a fingerprint sensor fails after a screen replacement; face recognition refuses to enroll under unusual lighting; voice authentication degrades after the user catches a cold. A robust IAM program offers a fallback authentication path that is itself strongly authenticated — typically a hardware token combined with a brief identity verification — rather than letting the helpdesk reset MFA on a phone call.

Warning The single most common authentication failure investigation in modern enterprises is "user clicked an Approve push notification they should not have approved." MFA push approval without context is barely better than no MFA at all. Conditional access policies that show the location, the app, and a number-matching code are an architectural fix, not a user-training problem.


Chapter Review and Conclusion

In this chapter we examined identity as the connective tissue of modern security architecture. We started with the identity lifecycle — the journey from joiner through mover to leaver — and saw why automation, authoritative HR feeds, and federation are the difference between an IAM program that scales and one that quietly accumulates risk. We compared the four major access control models — RBAC, ABAC, MAC, and DAC — and looked at how policy decision points and policy enforcement points let architects centralize policy without centralizing every application.

We walked through conditional access, attestation, biometrics, and identity proofing as the modern signals that turn authentication from a one-time event into a context-aware decision. We then dedicated focused attention to high-privilege accounts: PAM platforms, just-in-time elevation, session brokering, secrets management, automated rotation, and the painful lesson of Golden SAML — that federation concentrates trust in a small number of cryptographic secrets that must be defended like the crown jewels they are. We also examined the common attack paths that abuse IAM directly: credential dumping, privilege escalation, unauthorized execution through legitimate admin paths, lateral movement across trust boundaries, defensive evasion inside identity systems, and injection against identity workflows. Finally, we worked through the practical reality of authentication troubleshooting across SAML, OAuth, OIDC, Kerberos, EAP, 802.1X, SAE, and the everyday MFA and biometric failures that fill an analyst's queue.

The thread connecting all of this is precision. Identity is where Zero Trust either becomes operational or remains a slogan. The architect's job is to make sure that every subject — human or machine — has exactly the access it needs, no more, only when it needs it, only when the context justifies it, and only with strong enough evidence of who they are.

Key Terms Review

Review Questions

  1. Federation lets the Service Provider read the user's password directly from the Identity Provider so it can verify it independently.
  2. Deprovisioning a departing employee's access only in the HR system is sufficient if the HR system is the authoritative source.
  3. RBAC can suffer from "role explosion" as organizations create increasingly specific roles to capture business reality.
  4. ABAC decisions can incorporate attributes about the subject, the object, the action, and the environment.
  5. Mandatory Access Control (MAC) lets the resource owner override classification decisions when convenient.
  6. A Policy Decision Point evaluates a request and returns permit or deny; a Policy Enforcement Point implements that decision.
  7. Conditional access typically combines signals such as device posture, geographic location, and time of day.
  8. A leaked biometric template is no different from a leaked password — you simply enroll a new biometric.
  9. Just-in-time access reduces standing privilege by elevating a user only for a defined window of time.
  10. Hardcoded secrets in a git repository are safe to remove with a single delete commit because the history is inaccessible.
  11. The Golden SAML attack works by stealing the SAML token-signing key from the federation server.
  12. SAML troubleshooting frequently involves clock skew, certificate expiry, and entity ID mismatches.
  13. OAuth 2.0 and OpenID Connect are the same protocol with two different names.
  14. Kerberos depends critically on time synchronization between the client, server, and key distribution center.
  15. 802.1X with EAP-TLS authenticates clients using certificates rather than passwords.
  16. WPA3's SAE handshake is designed to resist offline dictionary attacks compared with WPA2's four-way handshake.
  17. MFA fatigue attacks are best mitigated by disabling MFA for users who report frequent push prompts.
  18. A PAM platform's value comes primarily from storing privileged passwords; session recording is unnecessary overhead.
  19. Service accounts and workload identities should be subject to the same lifecycle discipline as human accounts.
  20. Federation concentrates trust in cryptographic secrets such as token-signing certificates, which must be protected as Tier 0 assets.
  21. Credential dumping is purely an endpoint malware topic and has little relevance to IAM architecture.
  22. Lateral movement in enterprise environments often depends on the reuse of compromised identity material across trust zones.
  23. Registering a new MFA device, adding a new OAuth consent grant, or disabling audit collection can all be examples of defensive evasion in identity systems.
  24. LDAP injection, claim manipulation, and provisioning abuse are examples of trust-boundary failures that can affect identity workflows.

Answer Key

  1. False: The SP never sees the user's password. The IdP authenticates the user and issues a signed assertion that the SP validates.
  2. False: Deprovisioning must propagate to every downstream application; unfederated apps require SCIM, manual offboarding, or retirement.
  3. True: Without disciplined role design, RBAC environments accumulate thousands of overlapping roles.
  4. True: That is precisely the four-tuple ABAC evaluates.
  5. False: MAC is enforced by the system regardless of owner preference — that rigidity is its security value.
  6. True: The PDP/PEP separation is the standard architecture for centralized policy with distributed enforcement.
  7. True: These context signals are the foundation of conditional access policies.
  8. False: Biometrics cannot be reissued. A leaked template compromises that biometric forever.
  9. True: JIT access is the cornerstone of modern PAM design.
  10. False: Git history retains the secret. Rotation is the only safe response after exposure.
  11. True: That is the core mechanism — once the signing key is stolen, the attacker can forge arbitrary assertions.
  12. True: These are the three most common SAML failure modes.
  13. False: OAuth 2.0 is an authorization framework. OIDC is an authentication layer built on top of it.
  14. True: Kerberos typically requires time synchronization within five minutes.
  15. True: EAP-TLS uses mutual certificate authentication.
  16. True: SAE provides forward secrecy and resists offline dictionary attacks.
  17. False: The right response is number matching, context-rich prompts, or phishing-resistant factors — not disabling MFA.
  18. False: Session recording, approval workflows, and JIT elevation are all core PAM capabilities, not optional extras.
  19. True: Service accounts often have broader privilege than humans and require equally rigorous lifecycle management.
  20. True: The Golden SAML attack and many similar federation compromises hinge on the protection of these signing keys.
  21. False: Credential dumping is an endpoint technique, but in SecurityX it is highly relevant to IAM because the stolen material is used to impersonate subjects, escalate privilege, and pivot across identity boundaries.
  22. True: Reused Kerberos tickets, hashes, tokens, and over-permissioned service accounts are common mechanisms for identity-enabled lateral movement.
  23. True: Quiet changes to MFA enrollment, OAuth grants, and logging are classic ways attackers evade detection while preserving identity-based access.
  24. True: These attacks manipulate the trust logic of directory, federation, or provisioning systems rather than simply breaking a generic application field.