Avoiding Serverless Production Horrors

emrah samdan

Serverless architecture is often portrayed as a silver bullet, a complete solution for all cloud applications’ pain points. Affordable, fast, scalable, no server maintenance, just business logic—a real promised land of application development. What could the horror be?

In its simplest form, serverless enables agile development of applications, allowing developers to shift their attention to scaling products instead of using their time to manage and operate servers or runtime in the cloud or on-premises.

The reality, however, is not so perfect. Serverless has powerful capabilities, but like every other technology, it has some flaws.

Real-life serverless horror stories

There can be several real-life serverless horror stories; however, there are also ways to avoid them and try to understand what caused the problems and how we can prevent them in our applications.

Kevin Vandenborne, wrote in 2017 “Serverless: A lesson learned. The hard way​,” which he describes how a simple and trivial bug in the code caused a massive increase in AWS cost. This story teaches us the importance of carefully monitoring Lambda function workloads for unexpected traffic volumes that will drive up costs and make your cloud bill highly unpredictable.

Read More: Debunking the distrust around AI tech

In 2018, Einar Egilsson wrote about his horror when he discovered the hard way that for consistently heavy workloads, serverless architecture is slower and more expensive than provisioning a server or cluster of servers to handle the load. He describes how a POC migration of his company’s API layer from Linux-based servers to an AWS Lambda/API Gateway architecture resulted in a 15 percent slower performance and eight times the cost.

Segment Company, where they did some impressive work reducing their AWS bill by over one million dollars annually. One of the reasons for their enormous infrastructure costs was a subtle bug that they were partitioning data in their DynamoDB table. A subtle DynamoDB sharding issue had seriously harmed its scalability and caused performance issues that cost the company 300.000 dollars a year.

The common denominator for the cost issues above is that, in contrast to standard server-based architecture, serverless apps can silently scale to handle the heavy workload (even if the buggy implementation causes it). While the application remains operational, it will heavily hit your wallet. Another important factor that engineers often overlook is the cost of supporting services, like API Gateway, that is needed for the Lambda application to work correctly. Also, the cost of data transfer is a significant part of the bill and is often not considered when designing an application architecture.

Learning from others

How can we learn from other people’s mistakes and avoid serverless issues in our applications? As always, there is no silver bullet, but you can take some preparatory steps to reduce the risk and spot issues as early as possible. Here are the possible ways of avoiding such problems with serverless architecture:

Knowledge and Documentation

Serverless does simplify a lot in terms of infrastructure management, but it’s not entirely plug-and-play. You’ll still need a decent knowledge of the services you use, their configuration details, and even how they work under the hood.

This will allow you to architect your solutions properly, identify problematic use cases, and identify the cause of issues when they do occur.

Read More: Deploying compliance and security automation with Kubernetes

Be aware of the specifics of the pay-per-use model and how to control your infrastructure costs.

Monitoring and Alerting

Monitoring is a crucial aspect of dealing with serverless issues. Serverless is a relatively new paradigm, with many asynchronous events and parallel executions. Since it works on a high abstraction layer and does not expose all the details of what is happening beneath the surface, you have to make sure you know how to track the state of your services, how to identify performance issues, and how to spot execution problems. Also, proper monitoring and alerting can prevent you from receiving a huge and unexpected infrastructure bill.

Proper Tooling Can Help

Proper monitoring is not an easy task. A specialized platform like Thundra, which offers advanced monitoring, debugging, and troubleshooting tools for serverless applications, can be a huge help. You can visualize the architecture of your serverless, distributed applications, tracing their execution in real-time and spotting performance bottlenecks and execution issues.

Read More:  Transforming enterprises the RPA way

Working with a set of professional tools dedicated to serverless applications allows you to avoid the majority of critical mistakes and issues in production. With proper monitoring and alerting, most of the serverless horror stories described in this article could have been avoided and turned into serverless success stories.