Full-Stack Architect

Brisbane, Australia

February 2026

20 min readNetworkingPart 6 of 9

Networking Fundamentals for Platform Engineers

Every request that hits your application travels through layers of networking. If you do not understand how TCP/IP works, how DNS resolves, or what happens inside a VPC — you are debugging blind when things break.

Why Platform Engineers Need Networking Knowledge

When a compliance officer submits a customer due diligence case and gets a timeout error, the problem could be anywhere in the network stack: DNS resolution failed, a security group is blocking traffic, a NAT gateway is saturated, the load balancer health check is misconfigured, or a TLS certificate expired. If you only understand application code, you will stare at perfectly working code while the real problem is three layers below.

You do not need to become a network engineer. But you need to understand enough to diagnose problems, design infrastructure, and have productive conversations with networking teams.

The Network Stack — Layers That Matter

The OSI model, simplified for platform engineering

Networking is organised in layers. Each layer has a specific job and talks to the layer above and below it. You will hear references to "Layer 4" or "Layer 7" constantly in platform engineering — here is what each one means and why you care about it.

Layer	Name	Key Protocols	What It Does	Platform Engineer Cares Because
7	Application	HTTP, HTTPS, DNS, FTP, SMTP, XMPP, SIP	Where your application communicates	API calls, REST endpoints, WebSocket connections, DNS resolution — this is where most debugging starts
6	Presentation	TLS/SSL, JSON, XML	Data formatting and encryption	TLS certificate issues, HTTPS handshake failures, certificate expiry alerts
5	Session	TLS sessions, WebSocket	Maintains connections between devices	Connection pooling, session timeouts, keep-alive settings
4	Transport	TCP, UDP, SSL/TLS	Reliable (TCP) or fast (UDP) data delivery	Port numbers, connection limits, load balancer types (L4 vs L7), health checks
3	Network	IPv4, IPv6, ICMP, IGMP	Routing packets between networks	VPC design, subnets, route tables, NAT gateways, security groups, CIDR blocks
2	Data Link	ARP, Ethernet, VLAN	Frame delivery within a local network	Rarely touched directly in cloud, but ARP resolution matters for on-premise / hybrid
1	Physical	Ethernet cables, Wi-Fi, Bluetooth	Raw electrical/optical signals	Cloud abstracts this away — but matters for edge computing and IoT

The Practical Truth

As a platform engineer working in cloud (AWS, Azure, GCP), you will spend 90% of your networking time on Layers 3, 4, and 7. Layer 3 is VPC/subnet/routing design. Layer 4 is TCP connections, port management, and NLB (Network Load Balancer). Layer 7 is HTTP routing, ALB (Application Load Balancer), API Gateway, and DNS.

TCP vs UDP — The Two Ways Data Travels

Understanding when reliability matters and when speed matters

TCP (Transmission Control Protocol)

Reliable, ordered delivery. Every packet is acknowledged. If a packet is lost, it is retransmitted.

Three-way handshake: SYN → SYN-ACK → ACK. This establishes a connection before any data is sent.

Used by: HTTP/HTTPS (all web traffic), database connections (PostgreSQL, MongoDB), SSH, SMTP (email), FTP

Platform context: Every API call between your microservices uses TCP. Database connection pooling is about managing TCP connections efficiently.

UDP (User Datagram Protocol)

Fast, no guarantees. Packets are sent without acknowledgement. If a packet is lost, it is gone.

No handshake: Just send the data. Much lower latency than TCP.

Used by: DNS queries, STUN/TURN (WebRTC), video streaming, VoIP (SIP/SDP), SNMP (monitoring), game servers

Platform context: DNS resolution uses UDP for speed. Health check pings sometimes use UDP. Log shipping (syslog) often uses UDP.

Protocols You Will Actually Encounter

A practical reference for the protocols that appear in production

Data Communication

ARP

Address Resolution Protocol — maps IP addresses to MAC addresses on a local network. If ARP resolution fails, devices on the same subnet cannot talk to each other.

IPv4 / IPv6

The addressing system of the internet. IPv4 uses 32-bit addresses (e.g., 10.0.1.50). IPv6 uses 128-bit addresses. VPCs use IPv4 CIDR blocks (e.g., 10.0.0.0/16) for subnet design.

ICMP

Internet Control Message Protocol — the protocol behind "ping" and "traceroute". Used to test if a host is reachable and to diagnose routing issues.

IGMP

Internet Group Management Protocol — manages multicast group memberships. Less common in cloud, but relevant for streaming and media applications.

Signalling & Real-Time

SIP / SDP

Session Initiation Protocol / Session Description Protocol — used in VoIP and video conferencing to set up, manage, and tear down real-time communication sessions.

XMPP

Extensible Messaging and Presence Protocol — used for real-time messaging, presence information, and contact lists. Think chat applications and IoT device communication.

Application Protocols

DNS

Domain Name System — translates human-readable names (api.example.com) to IP addresses. If DNS fails, everything fails. Route 53 is AWS's DNS service.

HTTP / HTTPS

The foundation of web communication. HTTPS adds TLS encryption. Every API call, every web page, every webhook uses HTTP(S). Status codes (200, 404, 500, 503) are your diagnostic language.

FTP / TFTP

File Transfer Protocol. FTP is authenticated; TFTP (Trivial FTP) is unauthenticated and uses UDP. SFTP (SSH-based) is the modern secure alternative.

STUN / TURN

Used in WebRTC for NAT traversal — helping peer-to-peer connections work when devices are behind firewalls and NAT gateways.

Network Management

SNMP

Simple Network Management Protocol — used to monitor and manage network devices (routers, switches, servers). SNMP traps are how network hardware reports problems. CloudWatch and Datadog often replace SNMP in cloud environments.

Wireless

L1-L2

Bluetooth / BLE

Short-range wireless for IoT devices, sensors, peripherals. Bluetooth Low Energy (BLE) is critical for IoT platforms and edge computing.

Cloud Networking — VPC Design in Practice

How all these protocols come together in a real AWS environment

A Virtual Private Cloud (VPC) is your own isolated network inside AWS. Every production system runs inside a VPC. Understanding VPC design is where networking theory meets platform engineering practice.

# Typical VPC architecture for a compliance platform

VPC: 10.0.0.0/16 (65,536 IP addresses)
│
├── Public Subnets (internet-facing)
│   ├── AZ-a: 10.0.1.0/24  → ALB, NAT Gateway, Bastion Host
│   └── AZ-b: 10.0.2.0/24  → ALB (redundant), NAT Gateway
│
├── Private Subnets (application layer — NO direct internet access)
│   ├── AZ-a: 10.0.10.0/24 → ECS Fargate tasks (CDD, Screening, Identity)
│   └── AZ-b: 10.0.11.0/24 → ECS Fargate tasks (redundant)
│
├── Data Subnets (database layer — most restricted)
│   ├── AZ-a: 10.0.20.0/24 → RDS Primary, ElastiCache, OpenSearch
│   └── AZ-b: 10.0.21.0/24 → RDS Standby (Multi-AZ failover)
│
Traffic flow:
  Internet → CloudFront (CDN) → WAF → ALB (public subnet)
  → ECS tasks (private subnet) → RDS/Redis (data subnet)
  
  ECS tasks → internet (for external APIs like DVS):
  Private subnet → NAT Gateway (public subnet) → Internet

Security Groups

Virtual firewalls attached to each resource. Stateful: if you allow inbound traffic, the response is automatically allowed. Define rules like "Allow TCP port 5432 from private subnets only" for database access.

NACLs

Network Access Control Lists — subnet-level firewall. Stateless: you must explicitly allow both inbound and outbound traffic. Acts as a second layer of defence behind security groups.

Route Tables

Define where traffic goes. Public subnet route: 0.0.0.0/0 → Internet Gateway. Private subnet route: 0.0.0.0/0 → NAT Gateway. Data subnet: no route to internet at all.

Diagnostic Tools — Your Networking Toolkit

Commands you will use when debugging connectivity problems

Tool	Command Example	What It Tests	When to Use It
ping	ping 10.0.10.50	ICMP reachability — is the host alive?	"Is this server even reachable from here?"
traceroute	traceroute api.example.com	Shows every network hop between you and the destination	"Where is the traffic being dropped?"
nslookup / dig	dig api.example.com	DNS resolution — does the name resolve to the right IP?	"Is DNS returning the correct address?"
curl	curl -v https://api.example.com/health	HTTP connectivity including TLS handshake and response	"Can I reach the API? What status code do I get?"
telnet / nc	nc -zv 10.0.20.10 5432	TCP port connectivity — is the port open and accepting connections?	"Can my app reach the database port?"
ss / netstat	ss -tlnp	Shows active TCP connections and listening ports	"What ports is this container listening on?"
tcpdump	tcpdump -i eth0 port 443	Captures raw network packets for deep inspection	"I need to see exactly what traffic is flowing"
mtr	mtr api.example.com	Combines ping + traceroute with continuous monitoring	"Is there intermittent packet loss on a specific hop?"
openssl	openssl s_client -connect api.example.com:443	Tests TLS/SSL connection and certificate details	"Is the certificate valid? Is the TLS version correct?"

DNS — The Internet's Phone Book (And Why It Breaks Everything)

The most underestimated piece of infrastructure

When you type "app.example.com" in a browser, DNS translates that name into an IP address (like 13.55.123.45) so your computer knows where to send the request. DNS failures are the single most common cause of "everything is down" incidents that are not actually application failures.

DNS Record Types Platform Engineers Manage

A Record: Maps a name to an IPv4 address (api.example.com → 13.55.123.45)

AAAA Record: Maps a name to an IPv6 address

CNAME: Alias — maps one name to another (www.example.com → example.com)

ALIAS/A: Points to an ALB or CloudFront distribution (AWS-specific)

MX Record: Mail server routing (who handles email for this domain)

TXT Record: Verification strings — SPF, DKIM, domain ownership proof

NS Record: Nameserver delegation — which DNS servers are authoritative

TTL: Time To Live — how long DNS resolvers cache the answer (300 = 5 minutes)

The TTL Trap

If you change a DNS record but the TTL was set to 86400 (24 hours), some users will still see the old IP address for up to 24 hours. Before any DNS migration, lower the TTL to 300 seconds (5 minutes) at least 24 hours in advance. This is one of the most common mistakes in production DNS changes.

Platform Engineering Series

This article is Part 6 of a 9-part series.

Part 1: What Platform Engineering Really Means Part 2: Monitoring, Observability & Why You Cannot Fix What You Cannot See Part 3: Fault Tolerance and Incident Management Part 4: DevOps, Automation, and Production Discipline Part 5: What a Platform Engineer, SRE, or Cloud Engineer Actually Knows Part 6: Networking Fundamentals for Platform Engineers ← You are here Part 7: Cybersecurity Fundamentals — What It Means and Why It Matters Part 8: Cybersecurity in Practice — How Production Platforms Are Protected Part 9: Cybersecurity Careers — What the Industry Actually Does

Note: The architecture examples in this series reference LexAML, a real-world AML/CTF compliance platform. The diagrams shown are high-level representations shared for educational purposes.

This content is compiled from various industry sources, official documentation, and practical experience gained across production environments. Your experience may differ based on your organisation, tech stack, and industry context.

We are continuously developing and fine-tuning this content. If something differs from your understanding, or if you have suggestions for improvement, we would genuinely appreciate hearing from you.

Reach out: sumit@getpostlabs.io