Skip to content

PATTERN Cited by 1 source

Secure tunnel to managed compute

Give a developer's local workstation an interactive / port-forward connection into an otherwise-isolated managed-compute environment without:

  • opening inbound ports on the target,
  • running a bastion / jump host,
  • requiring customer-managed VPN or per-user network ACLs,
  • or exposing long-lived SSH credentials.

Auth is IAM; transport is a control-plane-brokered outbound WebSocket initiated by the target to a relay service, through which the client streams an SSH or raw-TCP session.

Structure

  1. Client calls a provider API with an IAM identity (e.g. StartSession / ssm start-session).
  2. Control plane validates the IAM call, picks the target, creates a session record, returns a session token.
  3. Target runs an always-on agent that holds an outbound WebSocket to the relay; the agent accepts the session and terminates the SSH stream inside the target's normal security boundary.
  4. Data flows end-to-end between client and target through the relay. Relay sees bytes but not plaintext (SSH is still SSH inside); the control plane sees session metadata (who, when, what target) for audit but not the content.
  5. Disconnection (network drops, agent restart) is handled transparently — the agent re-establishes, the client reattaches via the session token. No "lost my shell again" friction.

This is concepts/control-plane-data-plane-separation applied to remote access: the IAM/session layer decides, the SSH payload delivers.

Benefits

  • No inbound attack surface. Target has no listening port reachable from any network except the local loopback the agent injects into.
  • IAM-scoped. Per-user, per-target, per-action policies in one place; revocation is instant.
  • Full audit trail. Control plane logs who connected where, when, and for how long.
  • Works across NATs / private subnets / account boundaries that classic bastions can't traverse without VPN stitching.
  • Resilient to transient network faults because reconnection is a protocol primitive, not a user-space concern.

Trade-offs

  • Vendor lock-in on the relay service (SSM, GCP IAP, Azure Bastion). Mitigated by the fact that your code runs inside the target — only the transport changes.
  • Agent must be trusted + kept alive on every target. Not viable for unmanaged / customer-admin-only boxes.
  • Policy complexity can replace network complexity — bad IAM can be as permissive as a flat network.
  • Not free of latency. The relay adds a hop; fine for interactive shells and IDE traffic, worse for bulk transfer unless the relay is co-located.

Canonical example: SageMaker AI StartSession

We built a new StartSession API that creates secure connections specifically for SageMaker AI spaces, establishing SSH-over-SSM tunnels through AWS Systems Manager that maintain all of SageMaker AI's security boundaries while providing seamless access. For VS Code users coming from Studio, the authentication context carries over automatically. For those who want their local IDE as the primary entry point, administrators can provide local credentials that work through the AWS Toolkit VS Code plug-in. And most importantly, the system handles network interruptions gracefully and automatically reconnects.

(Source: sources/2025-08-06-allthingsdistributed-removing-friction-sagemaker-ai-development)

The tunnel is an SSH session riding inside an SSM Session Manager transport (see systems/aws-systems-manager). The client never opens a socket to a SageMaker Studio IP; the space's agent dials out; Studio spaces keep all their managed-environment security properties; the developer gets a VS Code Remote-SSH workflow end-to-end. Predecessor: the community SageMaker SSH Helper project, which manually wired up what StartSession now productizes.

Seen in

Last updated · 200 distilled / 1,178 read