What It Takes to Monitor Serverless Applications in AWS Lambda

sorangutan

For developers, it has never been easier to build software and deploy it to production. Enabled by cloud platform providers, modern development frameworks, container and orchestration technologies like Docker and Kubernetes, and the emergence of managed services like autoscaled databases, experienced software teams can now build production systems at scale in hours or days instead of months.

The rise of serverless capabilities

Some of the most advanced capabilities available from the major cloud computing providers are now completely “serverless,” meaning that they require little to no management overhead to operate. These capabilities aren’t just appropriate for startups; large enterprises are also rapidly adopting these services as the standards within their organizations. These serverless tools range from databases like Amazon DynamoDB and object storage solutions like Google Cloud Storage to managed container hosting services like Amazon Elastic Container Service (ECS) and Functions-as-a-Service offerings such as AWS Lambda. These and other serverless tools abstract away the challenges of provisioning and supporting infrastructure; that becomes the responsibility of the cloud provider, which leaves only application architecture and logic for software development teams to handle. These serverless offerings also scale on demand, and you pay only for what you use.

The impact of serverless on innovation cycles and software engineering team productivity is clear. The reduction in effort and friction involved in building new applications allows businesses to innovate faster and react to changing demand more quickly. Rapid iteration allows for user experiences to evolve in response to customer feedback; inevitable mistakes can be quickly fixed. Independent development teams can extend existing system components without the need for centralized control. And dynamic scaling means that high-demand (read: high-value) parts of the system can flex to meet growing demand without requiring big infrastructure or architecture changes.

The evolution from microservices to nanoservices

This infrastructure flexibility has lead to an evolution from microservices to what are essentially nanoservice architectures. While organizations may maintain a variety of microservices with internal or public facing APIs, each of those microservices may itself be comprised of many other service components, all of which can be modified and scaled independently.Consider this simple but common web API example: An HTTP request is sent to an Amazon API Gateway, which invokes a synchronous Lambda function that does some processing and puts data into a DynamoDB table, returning success or the proper error code after the operation has been completed.

From there, you can evolve this architecture to do more complex things.

Here’s an example of a common, modern data-ingest-and-aggregation process: Similar to the web request above, the HTTP call is routed to a function that does some processing, but this time it publishes to an Amazon Kinesis stream. A different function has the job of consuming from that stream, doing some data aggregation work, and storing it in an Amazon S3 bucket for later use.

From there, it’s easy to introduce more complexity and deliver more value with even small changes to the system. Those system components can be combined into many different workflows. Development teams can reduce dependencies on each other by building their own extensions on the system. They are free to choose the tools, languages, and services they need to meet their objectives, modifying only the components required to compose the larger application, rather than change large chunks of code.

Building software has never been easier—but monitoring it has never been harder

This approach comes with a cost, however, and that cost is complexity: multiple languages, many services, and myriad paths through the system. So while it has never been easier to build software, understanding your systems and monitoring them has never been more difficult. The benefits of architectures using this approach are clear, yet the tools used to build, monitor, and troubleshoot these composite applications haven’t kept up.

Modern software developers need complete observability into their serverless applications, with the same depth and breadth they’ve come to expect in their legacy, monolithic apps. Key data for this observability includes:

Metrics to understand system health: Aggregate metrics are essential to understand what’s happening at the highest level. There are many examples of critical metrics, but the 4 Golden Signals from the Google SRE book is a good place to start. It’s critical to understand that metrics at every level help identify that there is a problem, but the details of the problem are also important.
Traces of end-to-end paths: When you discover a problem, whether from an alert, a support ticket, or an observation you make, you need to understand the end-to-end path through your distributed system. Looking at these traces helps you better understand the data flows and identify offending components, even if they are far downstream.
Detailed data for root cause analysis: Once you’ve found the source, you need to find the cause. Rich, code-level, HTTP request details and errors in the context of your metrics and traces enable you to determine what is wrong. You shouldn’t have to download a bunch of logs across multiple functions and services, aggregate them, and start running custom queries to get the answer.
Relationships for context: Serverless, highly iterative systems experience lots of change, and that change frequently comes in unplanned ways. There could be unexpected or unknown upstream or downstream systems causing an issue. When a problem occurs, you need to see the full context around the system.

Developers building and operating their software need all of this data in one place—using one tool on one platform. They need to see not just their serverless applications in isolation but also how these interact with legacy systems. And the monitoring data needs to scale dynamically, just like their applications—even on their biggest days.

New Relic AWS Lambda monitoring—now in private beta

That’s why we are so proud to announce a private beta of our AWS Lambda monitoring solution. Paired with New Relic’s ever-growing list of integrations providing insights into managed service data (see the documentation here), we give our customers a more complete picture of their serverless applications. Our solution is designed to let you monitor, visualize, troubleshoot, and alert on your Lambda functions, all in the New Relic platform you are already using. You can monitor Lambda performance in aggregate, and then drill down into individual Lambda invocations to see traces, errors, view end-to-end traces, and other troubleshooting information.

Starting with Node.js and Python, our solution includes automatic framework instrumentation specifically designed to run in the Lambda environment, along with new data collection tooling built to gather data and send it to New Relic with negligible overhead. Lambda-specific data is instrumented, including sources of invocation, errors, and tracebacks, as well as services called. That data is written to Amazon CloudWatch logs, where it is extracted via a log stream and sent to the New Relic platform.

Once collected, we add important metadata and tags for use in later queries, and we define relationships to the rest of the entities in your system. With the data in the New Relic platform, developers have access to the full power of our robust features: alerts, applied intelligence capabilities like anomaly detection, curated UI experiences, custom dashboards, API automation, and ad hoc analytics.

Early-access customers are already using our solution to monitor their production Lambda systems. As always, our product development process includes validating new solutions with select customers before we scale it to our entire customer base. If you are interested in getting early access and want to contribute feedback, sign up here. Unfortunately, not everyone who signs up will be invited to the private beta, but everyone will receive updates as we hit critical milestones on the way to general availability. We welcome your feedback—in fact, we thrive on it!