Skip to content

AWS: Digital Transformation at Santander — How Platform Engineering is Revolutionizing Cloud Infrastructure

Summary

Joint AWS × Santander architecture-blog post on Catalyst, Santander's internal platform built in partnership with the AWS Platform Strategy Program (PSP). Santander is a global bank (>10 countries, 160M+ customers, 200+ critical systems, billions of daily transactions) whose central problem was that provisioning new infrastructure took up to 90 days and deviated routinely from architectural standards. Catalyst is an internal developer platform (IDP) built on an EKS-hosted control plane cluster that uses Crossplane as a universal multi-cloud resource provisioner, ArgoCD for GitOps application delivery, and Open Policy Agent Gatekeeper as a central policies catalog enforcing compliance and security on every provisioning request. A stacks catalog of Crossplane Composite Resource Definitions + Compositions packages complex environments as one-click golden paths. The reported outcome: provisioning cycle 90 days → hours or minutes; provisioning 30 days → 2 days; proof-of-concept 90 days → 1 hour; 100+ pipelines consolidated into one control plane; AI-agent workload implementation 105 days → 24 hours; ~3,000 monthly data-experimentation tickets eliminated on the modern data platform built through Catalyst. Marketing-leaning AWS-Architecture- Blog format; the architecture shape is stated in full (EKS control plane + Crossplane + ArgoCD + OPA + developer-portal frontend + XRD stacks catalog) with strong quantified outcomes but no SLOs, no latency, no cluster sizing, no incident retrospective, no Crossplane Composition examples, no OPA policy examples.

Key takeaways

  1. Platform engineering as digital-transformation enabler, not as DevOps tooling. Catalyst's explicit job is to "abstract infrastructure provisioning complexity, standardize architectural compliance, and create a framework that enables new technologies" — the wiki's patterns/platform-engineering-investment pattern pinned to a 160M-customer bank running billions of daily transactions on 200+ critical systems. Second canonical production instance after ProGlove at opposite ends of the regulatory-complexity axis (bank vs SaaS; in-house platform team vs 3-person team).

  2. Control plane cluster is the architectural nucleus. A single Amazon EKS cluster acts as "the brain of the operation, orchestrating all components and workflows" — a textbook control-plane / data-plane split: the EKS cluster decides (policies, stacks, routing) while the provisioned AWS resources are the data plane that actually runs tenant workloads. Three load-bearing sub-components live inside the cluster: data-plane claims (ArgoCD-driven GitOps deploys), policies catalog (OPA Gatekeeper), stacks catalog (Crossplane XRDs + Compositions) (Source: sources/2026-02-26-aws-santander-catalyst-platform-engineering).

  3. Crossplane as universal multi-cloud provisioner. Santander explicitly chose Crossplane "as a universal resource provisioner" to "manage resources across multiple cloud providers consistently and declaratively". The wiki previously had no Crossplane page; this source introduces concepts/universal-resource-provisioning as a distinct concept from Terraform-class IaC (Crossplane models every resource as a K8s custom resource + controller, so the same K8s API + RBAC + GitOps stack that reconciles Deployments also reconciles RDS instances / S3 buckets / Snowflake schemas).

  4. Composite Resource Definitions + Compositions = the stacks catalog. Catalyst uses Crossplane's XRDs

  5. Compositions to package "complex environments" (patterns/crossplane-composition). A Composition bundles N cloud primitives behind one high-level CRD; the stacks catalog is "a library of composite resource definitions and Compositions enabling quick and standardized creation of complex environments" — Santander's realization of the golden path idea on top of multi-cloud infrastructure rather than K8s service definitions (Source: sources/2026-02-26-aws-santander-catalyst-platform-engineering).

  6. OPA Gatekeeper is the central policies catalog. Compliance and security constraints live in one place — "a central repository of policies ensuring compliance and security across all operations using Open Policy Agent" — and are enforced as admission-time gates on every provisioning request (patterns/policy-gate-on-provisioning). OPA Gatekeeper is the K8s admission-controller realization of OPA's Rego policy language; this is the regulated-industry analogue of SCPs + IAM in ProGlove's account-per-tenant shape — same guardrail intent, different enforcement layer (Kubernetes admission vs AWS Organizations).

  7. GitOps as the application-delivery contract. "Data plane claims ... managed by ArgoCD ... responsible for continuous synchronization and deployment of application stacks ... exploring the GitOps concept." Introduces ArgoCD + concepts/gitops to the wiki; the Git repository becomes the authoritative declaration of the data-plane, and the controller continuously reconciles observed state to declared state. Same property family (declarative + continuously reconciled) as Crossplane uses for infrastructure — Catalyst is a uniform Kubernetes-API-shaped control surface for both infrastructure and applications (Source: sources/2026-02-26-aws-santander-catalyst-platform-engineering).

  8. In-house developer portal as the platform's interface. "The platform's in-house frontend was developed as an intuitive developer portal, offering a unified interface for all provisioning and resource management needs." (patterns/developer-portal-as-interface). All underlying machinery (EKS / Crossplane / ArgoCD / OPA) is hidden behind one surface that application teams see; the platform team owns the surface and evolves it as internal product. Matches AGENTS.md's "Platform APIs become the internal product" framing in the existing patterns/platform-engineering-investment page.

  9. Quantified provisioning-time collapse. Headline numbers from the post: full provisioning cycle 90 days → hours / minutes in the best case; standard provisioning 30 days → 2 days; proof-of-concept preparation 90 days → 1 hour; 100+ existing pipelines in scope to consolidate "into a single control plane"; ~3,000 monthly tickets for data-experimentation environment provisioning eliminated via the modern data platform workload; generative AI agent stack implementation 105 days → 24 hours. Numbers are vendor-marketing shape (ratios, not distributions) — no p50/p99, no failure rates, no latency of the control plane itself (Source: sources/2026-02-26-aws-santander-catalyst-platform-engineering).

  10. Workload-variety claim. Three representative workloads shipped on Catalyst — generative AI agents, modern data platform (with Databricks integration + data lakes

  11. automated ETL + centralized catalog + segregated experimentation environments), and cloud process orchestration (legacy workflow migration to Step Functions with retry / error-handling / centralized monitoring). Claim: "Catalyst has the potential to be a universal platform, capable of supporting everything from traditional use cases to the most innovative ones involving AI and legacy system modernization." The breadth is the signal — one XRD catalog + one policy catalog + one portal covers all three (Source: sources/2026-02-26-aws-santander-catalyst-platform-engineering).

  12. Cultural outcome framed as co-equal with technical outcome. "Catalyst also catalyzed a cultural change within Santander, promoting an automation and self-service mindset among development teams." This is the same load-bearing claim in patterns/platform-engineering-investment — the pattern is platform and culture, not just tooling. Self-service mindset + automation mindset are what make the constant-team- size scaling property hold.

Systems extracted

System Wiki page Role in Catalyst
Amazon EKS systems/aws-eks Control plane cluster — "the brain of the operation"
Crossplane systems/crossplane (new) Universal multi-cloud resource provisioner; XRDs + Compositions power the stacks catalog
ArgoCD systems/argocd (new) GitOps continuous-sync engine for data-plane claims
Open Policy Agent (Gatekeeper) systems/open-policy-agent (new) Central policies catalog enforcing compliance + security on every provisioning request
AWS Step Functions systems/aws-step-functions Destination for migrated legacy process-orchestration workflows in the cloud-process-orchestration workload
Databricks systems/databricks (stub) Named integration target in the modern-data-platform workload (built-in integration)
Santander Catalyst systems/santander-catalyst (new) The platform itself — the canonical production reference for this architecture shape

Concepts extracted

  • concepts/universal-resource-provisioning — Crossplane-class abstraction: every cloud resource modeled as a Kubernetes custom resource reconciled by a controller; uniform API + RBAC + event model across AWS / GCP / Azure / SaaS.
  • concepts/gitops — Git repository as declarative source of truth for system state; controllers continuously reconcile observed state with declared state; "deploy via pull request" replaces "deploy via CLI."
  • concepts/control-plane-data-plane-separation — Catalyst's EKS cluster is an explicit control plane; provisioned AWS resources are the data plane; extends the existing concept page with its first infrastructure-provisioning instance.
  • concepts/policy-as-data — OPA Gatekeeper policies kept in a central repository separate from code; extends existing concept from Convera (Cedar in DynamoDB) to Rego in OPA Gatekeeper; same shape, different substrate.

Patterns extracted

  • patterns/platform-engineering-investment — second canonical production instance after ProGlove; extends the existing page with a "large-enterprise regulated industry" counterpart to ProGlove's "small-team SaaS multi-tenant" instance.
  • patterns/developer-portal-as-interface (new) — single intuitive frontend hiding EKS / Crossplane / ArgoCD / OPA behind one self-service surface; canonical realization of "Platform APIs become the internal product" (patterns/platform-engineering-investment section).
  • patterns/crossplane-composition (new) — XRDs + Compositions as the composability primitive; bundle N cloud primitives behind one high-level CRD; the stacks-catalog unit-of-reuse.
  • patterns/policy-gate-on-provisioning (new) — OPA Gatekeeper as a K8s admission controller rejecting non-compliant provisioning requests at the point of manifest submission, not after the resources exist; shift-left compliance.
  • patterns/golden-path-with-escapes — Catalyst's stacks catalog is the multi-cloud-infrastructure realization of the same pattern the wiki already has from Figma's K8s-service-def instance; extends via a second Seen-in.

Operational / architectural numbers

Metric Before After
Full provisioning cycle Up to 90 days Hours (best case: minutes)
Standard provisioning 30 days 2 days
Proof-of-concept preparation 90 days 1 hour
Pipelines to consolidate 100+ 1 (single control plane)
Monthly data-experimentation tickets ~3,000 ~0
Generative AI agent workload implementation 105 days 24 hours
Tenant data-experimentation env provisioning tickets (per env) dozens eliminated

Workload-level architectural details (all shipped through Catalyst):

  • Generative AI agents stack — "first success case", complete stack for AI agents integrating (the post lists the bullet label but elides the specific AWS AI services in this source).
  • Modern data platform — built-in Databricks integration + data lakes + automated ETL workflows
  • centralized data catalog + segregated experimentation environments; cited as "one of the most complex workloads implemented through Catalyst."
  • Cloud process orchestration — migrating legacy workflows to AWS Step Functions + retry patterns + error handling + centralized process monitoring.

Caveats and gaps

  • No distribution shape for the headline numbers. "Hours" and "minutes" are not p50/p99; pre-/post- ratios are stated without sample size. "Billions of daily transactions" is business throughput, not Catalyst-tier throughput.
  • No Composition examples. The post names XRDs + Compositions but shows none; composition shape (which AWS primitives, how nested, how parametrized) is not disclosed.
  • No OPA policy examples. Policies catalog is described at capability level only; no Rego policy snippet, no enforcement- failure-mode behavior (block / audit / warn).
  • No cluster sizing / latency / SLO. EKS cluster size, HA topology, RBAC model, Crossplane provider list, ArgoCD app-of-apps depth — none disclosed.
  • No failure-mode or incident retrospective. "Virtuous cycle of continuous improvement" is the only retrospective framing; no Crossplane-controller outage, no ArgoCD sync conflict, no OPA policy-update gone wrong.
  • No 3-person-team-style staffing number. Catalyst's operating team is not quantified (how many engineers build + run the platform at Santander's scale).
  • No cost figures. Savings claim is time-to-provision only, not cost-to-provision or TCO.
  • Partnership framing. AWS Professional Services' role is named but the long-term operational handoff (who owns the platform post-engagement) is not stated.
  • Tier-1 AWS Architecture Blog signal, marketing-leaning. Co- authored with AWS SAs; bias toward "Catalyst solved it" framing is visible.

Relationship to prior AWS sources

  • Complements ProGlove 6,000-accounts — both are canonical production instances of patterns/platform-engineering-investment at opposite ends of the regulatory-industry axis. ProGlove = SaaS multi-tenant, AWS-native guardrails (SCPs / StackSets / Step Functions), 3-person team, account boundary = isolation. Santander = regulated bank, Kubernetes-native guardrails (OPA Gatekeeper / Crossplane / ArgoCD), in-house-team size not disclosed, EKS control plane = substrate. Same pattern, different substrate, different regulatory pressure.
  • Complements Figma EKS migration — Figma's migration shipped the same golden path pattern at K8s service-def level (per-service Bazel config → CI-generated YAML); Catalyst runs the same pattern at cloud-infrastructure level (stacks catalog of XRDs). Different layer, same posture.
  • Contrasts with Convera AVP — both use concepts/policy-as-data but at different layers: Convera uses Cedar in AVP for application- request-time authz; Santander uses OPA Gatekeeper Rego for infrastructure-provisioning-time compliance. Same discipline (policies live in a dedicated store, separate from code), two enforcement layers.

Raw file

raw/aws/2026-02-26-digital-transformation-at-santander-how-platform-engineering-f64c5568.md (AWS Architecture Blog, 2026-02-26; cowritten with Julio Bando, Santander F1RST senior expert technology architect; AWS side Jaime Nagase, Robert da Costa, Edgar Costa Filho, Guilherme Greco, Joao Melo, Jacob Mevorach — PSP co-creator, Michael Silva).

Source

Last updated · 200 distilled / 1,178 read