how does serverless computing work? | Linagora

how does serverless computing work?

When an HTTP request reaches an endpoint, what happens if no server is waiting for it? 
That is exactly the question posed by serverless computing, an execution model where the infrastructure materialises on demand, processes the request, and then disappears. Understanding how serverless computing works requires moving beyond the marketing buzzword to examine the actual mechanisms: resource allocation, function lifecycle, event‑driven triggers, and millisecond‑level billing. 
In 2026 the three major cloud providers (AWS Lambda, Azure Functions, Google Cloud Functions) collectively handle billions of invocations each day, and players such as Cloudflare Workers or Vercel have broadened the model to include edge functions. This paradigm does not eliminate servers; it makes them invisible to development teams. Serverless in modern cloud computing has become an indispensable approach for building scalable applications. Below is what actually happens.

how does serverless computing work

 

Definition and Core Principles of Serverless

Serverless computing denotes an execution model where the cloud provider dynamically provisions, sizes, and manages the underlying infrastructure. Developers deploy code as functions or event‑driven containers without ever configuring a virtual machine, operating system, or server runtime. The term “without servers” is misleading: physical servers do exist in the provider’s data‑centers. The fundamental difference lies in operational responsibility: patching, scaling, high availability, and memory allocation are fully delegated. This definition cleanly answers the question “What is serverless computing?”  It is a model where infrastructure is completely abstracted away from development teams. In this approach, using an open‑source cloud can also give organisations greater control over their technological environment.

Full Abstraction of Infrastructure Management

In a traditional EC2 or Kubernetes cluster deployment, the operations team selects the instance type, configures auto‑scaling groups, handles Linux kernel updates, and monitors CPU usage. With serverless, those layers disappear from the team’s scope. The developer provides a code artifact (a ZIP file, an OCI container image,supported by Lambda since 2020, or an edge bundle), sets the memory allocation (128 MiB to 10 GiB on AWS Lambda in 2026), and specifies an entry‑point handler. The provider takes care of the rest: placement on a physical host, isolation via micro‑VMs (Firecracker on AWS) or V8 isolates (Cloudflare), and recycling the environment after inactivity. This workflow mirrors the evolution of open‑source service models offered by many tech players and perfectly illustrates how serverless technology operates.

Pay‑for‑Actual‑Use Billing Model

Serverless billing relies on two metrics: the number of invocations and the execution duration multiplied by allocated memory. Since 2020 AWS Lambda bills in 1‑ms increments, at a rate of $0.0000166667 per GB‑second in 2026. Concretely, a function configured with 512 MiB that runs for 200 ms costs roughly $0.0000016 per invocation. This model eliminates the cost of idle servers: an app receiving 100 requests per day pays only for those 100 executions, unlike a t3.micro EC2 instance billed 24 h a day even at 0 % load. The downside: a constantly‑loaded, predictable‑traffic app can become more expensive serverless than with reserved instances. This comparison often raises the question “Serverless computing vs. cloud: what’s the difference?” The cloud supplies the underlying infrastructure and services; serverless is a specific execution model built on top of that infrastructure.

 

Technical Workflow: From Event to Execution

Serverless is fundamentally event‑driven. No function runs continuously awaiting requests. Each execution is triggered by an external event, processed by the provider’s runtime, then terminated. This event→execution→termination flow is the heart of the technical operation.

Triggers and Function‑as‑a‑Service (FaaS)

A trigger is an event that causes a function to be invoked. Sources include:

  • An HTTP request via an API gateway (API Gateway, Cloud Endpoints)
  • A message placed on a queue (SQS, Pub/Sub, EventBridge)
  • A change in a storage bucket (file upload to S3)
  • A database event (DynamoDB Streams, Firestore triggers)
  • A scheduled cron timer (e.g., every 5 minutes)

The FaaS component receives the event as a JSON payload, spins up an execution environment if none is available, and calls the handler with the invocation context. Developers never see the host server: they receive the event, return a response, and the environment is either released or idled.

Automatic Horizontal Scaling

One of the most distinctive serverless mechanisms is granular, automatic horizontal scaling. Each invocation can theoretically run in its own isolated container. If 1 000 requests arrive simultaneously, the provider provisions up to 1 000 execution environments in parallel. AWS Lambda has a default concurrency limit of 1 000 per account per region, extensible on request to tens of thousands. Google Cloud Functions v2, built on Cloud Run, allows up to 1 000 concurrent requests per instance, reducing the total number of instances needed. This scaling happens without any configuration,no auto‑scaling policies, no CPU thresholds. When traffic drops to zero, instances automatically scale down to none.

Lifecycle of an Ephemeral Function

A serverless function goes through three phases. code download, runtime launch (Node.js, Python, Java, .NET, Rust), and execution of global initialisation code. On AWS Lambda a typical cold start takes 100 ms (Python/Node) up to 1-2 s (Java with Spring). Invocation, the actual handler execution. Freeze, after execution, the environment remains in memory for a few minutes, ready for a warm start that skips cold‑start steps and reduces latency to a few milliseconds. After a variable idle period (observed between 5 and 45 minutes depending on provider), the environment is destroyed.

 

Core Components of the Serverless Ecosystem

A serverless architecture is more than just FaaS functions; it relies on a suite of managed services that replace each traditional infrastructure component: storage, databases, networking, and orchestration.

Managed Storage and Databases

Because functions are stateless, any persistence must be externalised. The most common serverless‑friendly databases are DynamoDB (AWS), Firestore (Google Cloud), and Cosmos DB (Azure), which bill per request and auto‑scale. Object storage, S3, Cloud Storage, Azure Blob, serves both as a file repository and as a source of trigger events. Relational databases pose a specific challenge: each invocation may open a new connection, quickly exhausting the connection pool of a classic PostgreSQL or MySQL instance. AWS introduced RDS Proxy to mitigate this, and in 2026 services like PlanetScale or Neon provide native serverless relational databases with built‑in pooling. This can be complemented with open‑source software support to improve interoperability.

API Gateways and Request Routing

An API gateway is the primary entry point for HTTP‑triggered functions. Amazon API Gateway, Google API Gateway, and Azure API Management receive requests, handle authentication (JWT, API keys, OAuth 2.0), enforce rate‑limiting, and route to the correct function. AWS API Gateway offers two modes: REST API (feature‑rich, higher cost) and HTTP API (lower latency, roughly one‑third the price). The gateway transforms the incoming HTTP request into a JSON event passed to the Lambda function, then converts the function’s response back into a standard HTTP reply. This component typically adds 5‑15 ms of latency, an acceptable overhead for most use cases.

 

Benefits and Challenges of a Serverless Architecture

Adopting serverless fundamentally changes development and operations practices. The advantages are real, but technical constraints must be acknowledged.

Development Agility and Faster Time‑to‑Market

Without infrastructure to provision, a developer can spin up a functional API in a few hours. Frameworks such as Serverless Framework, AWS SAM, or SST (Solid Start Toolkit), very popular in 2026, allow infrastructure‑as‑code (IaC) definitions and one‑command deployments of functions, databases, and API gateways. Feedback cycles shrink: a bug‑fix can be live in production in under 60 seconds. Small teams benefit because they do not need a dedicated DevOps engineer to manage clusters. Operational costs drop for variable or unpredictable traffic: a side project, a webhook processor, or an on‑demand data pipeline only pays for actual executions.

Technical Constraints: Cold Starts and Vendor Lock‑In

Cold start latency remains the main irritant. For latency‑sensitive workloads (real‑time APIs, UI back‑ends), a 1‑2 second first‑request delay can be unacceptable. AWS offers Provisioned Concurrency to keep environments warm, but this essentially reserves resources and erodes the cost advantage of pure serverless. Vendor lock‑in is the other major challenge. A Lambda function that uses DynamoDB, SQS, and API Gateway is tightly coupled to the AWS ecosystem; migrating to Google Cloud would require substantial rewrites. Projects like Knative or OpenFaaS provide a portable abstraction layer, but their production adoption still lags behind native hyperscaler services. Nonetheless, open‑source community initiatives are gradually reducing this dependency by delivering more open and interoperable solutions.

 

Concrete Use‑Cases and Future Outlook

Serverless shines in specific scenarios:

  • Image processing on upload (auto‑resize via S3 + Lambda)
  • Mobile‑app back‑ends with sporadic traffic
  • Event‑driven ETL pipelines
  • Chatbots and webhook integrations

In 2026 the rise of edge functions (Cloudflare Workers, Vercel Edge Functions, Deno Deploy) extends the serverless model to geographic edge locations, cutting end‑user latency to under 50 ms. “Serverless‑first” architectures are becoming the norm for startups and product teams that prioritize rapid iteration.

Limits remain for long‑running workloads (video transcoding, machine‑learning model training) and applications needing fine‑grained network or filesystem control. AWS Lambda now allows a maximum execution time of 15 minutes, which is still insufficient for some use cases. The future points toward a convergence of containers and serverless: AWS Fargate, Google Cloud Run, and Azure Container Apps already charge per usage with zero‑scale capability, blurring the line between the two approaches.

Practical recommendation: start with an isolated event‑driven use case (e.g., a webhook or asynchronous processor), measure real costs and latency, then gradually expand the scope based on observed results.