Optimizing AWS Lambda With MongoDB Atlas & NodeJS

Raphael Londner
April 10, 2017 | Updated: February 22, 2021
#Technical

I attended an AWS user group meeting some time ago, and many of the questions from the audience concerned caching and performance. In this post, I review the performance implications of using Lambda functions with any database-as-a-service (DBaaS) platform (such as MongoDB Atlas). Based on internal investigations, I offer a specific workaround available for Node.js Lambda functions. Note that other supported languages (such as Python) may only require implementing some parts of the workaround, as the underlying AWS containers may differ in their resource disposal requirements. I will specifically call out below which parts are required for any language and which ones are Node.js-specific.

AWS Lambda is serverless, which means that it is essentially stateless. Well, almost. As stated in its developer documentation, AWS Lambda relies on a container technology to execute its functions. This has several implications:

The first time your application invokes a Lambda function it will incur a penalty hit in latency – time that is necessary to bootstrap a new container that will run your Lambda code. The definition of "first time" is fuzzy, but word on the street is that you should expect a new container (i.e. a “first-time” event) each time your Lambda function hasn’t been invoked for more than 5 minutes.
If your application makes subsequent calls to your Lambda function within 5 minutes, you can expect that the same container will be reused, thus saving some precious initialization time. Note that AWS makes no guarantee it will reuse the container (i.e. you might just get a new one), but experience shows that in many cases, it does manage to reuse existing containers.
As mentioned in the How It Works page, any Node.js variable that is declared outside the handler method remains initialized across calls, as long as the same container is reused.

Understanding Container Reuse in AWS Lambda, written in 2014, dives a bit deeper into the whole lifecycle of a Lambda function and is an interesting read, though may not reflect more recent architectural changes to the service. Note that AWS makes no guarantee that containers are maintained alive (though in a "frozen" mode) for 5 minutes, so don’t rely on that specific duration in your code.

In our very first attempt to build Lambda functions that would run queries against MongoDB Atlas, our database as a service offering, we noticed the performance impact of repeatedly calling the same Lambda function without trying to reuse the MongoDB database connection. The wait time for the Lambda function to complete was around 4-5 seconds, even with the simplest query, which is unacceptable for any real-world operational application.

In our subsequent attempts to declare the database connection outside the handler code, we ran into another issue: we had to call db.close() to effectively release the database handle, lest the Lambda function time out without returning to the caller. The AWS Lambda documentation doesn’t explicitly mention this caveat which seems to be language dependent since we couldn’t reproduce it with a Lambda function written in Python.

Fortunately, we found out that Lambda’s context object exposes a callbackWaitsForEmptyEventLoop property, that effectively allows a Lambda function to return its result to the caller without requiring that the MongoDB database connection be closed (you can find more information about callbackWaitsForEmptyEventLoop in the Lambda developer documentation). This allows the Lambda function to reuse a MongoDB Atlas connection across calls, and reduce the execution time to a few milliseconds (instead of a few seconds).

In summary, here are the specific steps you should take to optimize the performance of your Lambda function:

Declare the MongoDB database connection object outside the handler method, as shown below in Node.js syntax (this step is required for any language, not just Node.js):

'use strict'

var MongoClient = require('mongodb').MongoClient;

let cachedDb = null;

In the handler method, set context.callbackWaitsForEmptyEventLoop to false before attempting to use the MongoDB database connection object (this step is only required for Node.js Lambda functions):

exports.handler = (event, context, callback) => {

    context.callbackWaitsForEmptyEventLoop = false;

Try to re-use the database connection object using the MongoDB.connect(Uri) method only if it is not null and db.serverConfig.isConnected() returns true (this step is required for any language, not just Node.js):

function connectToDatabase(uri) {
  
    if (cachedDb && cachedDb.serverConfig.isConnected()) {
        console.log('=> using cached database instance');
        return Promise.resolve(cachedDb);
    }
    const dbName = 'YOUR_DATABASE_NAME';
    return MongoClient.connect(uri)
        .then(client => { cachedDb = client.db(dbName); return cachedDb; });
}

Do NOT close the database connection! (so that it can be reused by subsequent calls).

The Serverless development with Node.js, AWS Lambda and MongoDB Atlas tutorial post makes use of all these best practices so I recommend that you take the time to read it. The more experienced developers can also find optimized Lambda Node.js functions (with relevant comments) in:

I’d love to hear from you, so if you have any question or feedback, don’t hesitate to leave them below.

Additionally, if you’d like to learn more about building serverless applications with MongoDB Atlas, I highly recommend our webinar below where we have an interactive tutorial on serverless architectures with AWS Lambda.

Watch Serverless Architectures with AWS Lambda and MongoDB Atlas

About the Author - Raphael Londner

Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner.

Learn more about using MongoDB with AWS, either self-managed or with our fully-managed database as a service, MongoDB Atlas. You can also check out information about the estimated cost of running MongoDB on AWS with MongoDB Atlas.

← Previous

10-Step Methodology to Creating a Single View of your Business: Part 1

Organizations have long seen the value in aggregating data from multiple systems into a single, holistic, real-time representation of a business entity. That entity is often a customer. But the benefits of a single view in enhancing business visibility and operational intelligence can apply equally to other business contexts. Think products, supply chains, industrial machinery, cities, financial asset classes, and many more. However, for many organizations, delivering a single view to the business has been elusive, impeded by a combination of technology and governance limitations. In this 3 part blog series, we will explore what it takes to successfully deliver a single view project: In Part 1 today, we will review the business drivers behind single view projects, introduce a proven and repeatable 10-step methodology to creating the single view, and discuss the initial “Discovery” stage of the project In Part 2 , we dive deeper into the methodology by looking at the development and deployment phases of the project In Part 3 , we wrap up with the single view maturity model, look at required database capabilities to support the single view, and present a selection of case studies. If you want to get started right now, download the complete 10-Step Methodology to Creating a Single View whitepaper . MongoDB has been used in many single view projects across enterprises of all sizes and industries. This whitepaper shares the best practices we have observed and institutionalized over the years. It provides a step-by-step guide to the methodology, governance, and tools essential to successfully delivering a single view project. Why Single View? Why Now? Today’s modern enterprise is data-driven. How quickly an organization can access and act upon information is a key competitive advantage. So how does a single view of data help? Most organizations have a complicated process for managing their data. It usually involves multiple data sources of variable structure, ingestion and transformation, loading into an operational database, and supporting the business applications that need the data. Often there are also analytics, BI, and reporting that require access to the data, potentially from a separate data warehouse or data lake. Additionally, all of these layers need to comply with security protocols, information governance standards, and other operational requirements. Inevitably, information ends up stranded in silos. Often systems are built to handle the requirements of the moment, rather than carefully designed to integrate into the existing application estate, or a particular service requires additional attributes to support new functionality. Additionally, new data sources are accumulated due to business mergers and acquisitions. All of a sudden information on a business entity, such as a customer, is in a dozen different and disconnected places. Figure 1: Sample of single view use cases Single view is relevant to any industry and domain as it addresses the generic problem of managing disconnected and duplicate data. Specifically, a single view solution does the following: Gathers and organizes data from multiple, disconnected sources; Aggregates information into a standardized format and joint information model; Provides holistic views for connected applications or services, across any digital channel; Serves as a foundation for analytics – for example, customer cross-sell, upsell, and churn risk. Figure 2: High-level architecture of single view platform Introducing the 10 Step Methodology to Delivering a Single View From scoping to development to operationalization, a successful single view project is founded on a structured approach to solution delivery. In this section of the blog series, we identify a repeatable, 10-step methodology and tool chain that can move an enterprise from its current state of siloed data into a real-time single view that improves business visibility. Figure 3: 10-step methodology to deliver a single view The timescale for each step shown in the methodology is highly project-dependent, governed by such factors as: The number of data sources to merge; The number of consuming systems to modify; The complexity of access patterns querying the single view. MongoDB’s consulting engineers can assist in estimating project timescales based on the factors above. Step 1: Define Project Scope & Sponsorship Building a single view can involve a multitude of different systems, stakeholders, and, business goals. For example, creating a single customer view potentially entails extracting data from numerous front and back office applications, operational processes, and partner systems. From here, it is aggregated to serve everyone from sales and marketing, to call centers and technical support, to finance, product development, and more. While it’s perfectly reasonable to define a future-state vision for all customer data to be presented in a single view, it is rarely practical in the first phase of the project. Instead, the project scope should initially focus on addressing a specific business requirement, measured against clearly defined success metrics. For example, phase 1 of the customer single view might be concentrated on reducing call center time-to-resolution by consolidating the last three months of customer interactions across the organization’s web, mobile, and social channels. By limiting the initial scope of the single view project, precise system boundaries and business goals can be defined, and department stakeholders identified. With the scope defined, project sponsors can be appointed. It is important that both the business and technical sides of the organization are represented, and that the appointees have the authority to allocate both resources and credibility to the project. Returning to our customer single view example above, the head of Customer Services should represent the business, partnered with the head of Customer Support Systems. Step 2: Identify Data Consumers This is the first in a series of iterative steps that will ultimately define the single view data model. In this stage, the future consumers of the single view need to share: How their current business processes operate, including the types of queries they execute as part of their day-to-day responsibilities, and the required Service Level Agreements (SLAs); The specific data (i.e., the attributes) they need to access; The sources from which the required data is currently extracted. Step 3: Identify Data Producers Using the outputs from Step 2, the project team needs to identify the applications that generate the source data, along with the business and technical owners of the applications, and their associated databases. It is important to understand whether the source application is serving operational or analytical applications. This information will be used later in the project design to guide selection of the appropriate data extract and load strategies. Wrapping Up Part 1 That wraps up the first part of our 3-part blog series. In Part 2, we will dive deeper into the Develop and Deploy phases of the single view methodology. Remember, if you want to get started right now, download the complete 10-Step Methodology to Creating a Single View whitepaper Download now

April 10, 2017

Next →

VertexAI and MongoDB for Intelligent Retail Pricing

In today’s competitive retail environment, the ability to quickly adjust pricing in response to market trends, consumer demand, and competitors’ moves is not just an advantage — it's essential for survival. This is where dynamic pricing comes into play, serving as a strategic tool for businesses to pull in their quest for market dominance. Dynamic pricing goes beyond changing numbers; it’s a strategic approach that reflects the dynamic nature of the market, powered by data-driven insights that enable prices to be adjusted in real-time for maximum effectiveness. This shift towards a more agile, data-driven pricing strategy underscores a broader trend in the business world: the recognition of data as a foundational element in decision-making processes. By leveraging real-time data, businesses can ensure their pricing strategies are not only responsive to market fluctuations but also strategically aligned with their overall business objectives, thus driving retail competitiveness to new heights. Let’s uncover how integrating both platforms empowers developers when it comes to delivering best-in-class, data-driven applications. MongoDB.local NYC Join us in person on May 2, 2024 for our keynote address, announcements, and technical sessions to help you build and deploy mission-critical applications at scale. Use Code Web50 for 50% off your ticket! Learn More Google Cloud: A platform for real-time analytics and AI Google Cloud stands out as a powerhouse in real-time analytics and artificial intelligence (AI), offering the infrastructure necessary for dynamic pricing strategies and other data-driven business approaches. It's designed to facilitate big data analysis, machine learning, and operational agility. Built-in tools form the backbone of an effective dynamic pricing strategy. These include Vertex AI for advanced machine learning models following best- in- class MLOps practices, and Pub/Sub for real-time messaging to solve real- time data ingestion. By harnessing the power of Google Cloud, retailers can analyze vast quantities of data in real-time, from current market trends to customer behavior and competitor pricing. This enables businesses to make informed decisions swiftly, adjusting their pricing strategies to reflect the ever-changing market conditions. MongoDB: Flexible data modeling and rapid application development MongoDB complements Google Cloud by offering a high performance document- based database with a flexible data model that allows rapid application development. For pricing data in particular, where there may be different variants for different sizes of store or country, the flexibility allows for the ease of storage of complex or hierarchical data. In addition, polymorphic capabilities allow you to use a single interface to represent different types, making your system more flexible. It also supports scalability as new types can be easily integrated. Lastly, it enhances efficiency by allowing the same operation to behave differently based on the object, reducing code redundancy. This flexible schema also enables seamless integration with AI models. MongoDB Atlas supports workload isolation , ensuring dedicated resources for AI tasks and smooth operation alongside core application workloads. Additionally, change streams and triggers can be utilized to capture real-time updates in the pricing data, allowing the AI model to be called upon for immediate analysis and adaptation and enabling in-app analytics for retailers to gain a competitive edge. Figure 1: MongoDB replica set: Workload Isolation In the dynamic pricing reference architecture, Atlas collections function as an ML feature store. By leveraging the capabilities of MongoDB Atlas as a developer data platform, we are able to embed real-time automated decision-making into our e-commerce applications and reduce operational overhead for both business operations and MLOps model fine-tuning. This is achieved through implementing a streamlined approach to data management, incorporating real-time, automated decision making, workload isolation, change streams, triggers for immediate updates, and seamless integration with AI models. Dynamic prising microservice overview Building an event-driven AI architecture leveraging MongoDB Atlas in Google Cloud is straightforward. We can summarize our dynamic pricing microservice by first describing the different components of its architecture, what they are used for, and how they interact with each other: Figure 2: Description of the different technology components of a dynamic pricing microservice and what they are used for. Handling data sources The proposed solution uses Google Cloud Pub/Sub to ingest data sources like customer behavior events in JSON format. Using a technology like Pub/Sub allows for scaling to handle a large number of messages and efficiently distribute them to many subscribers. This is partly because it allows for parallel processing of messages and can be distributed across multiple servers or instances. It is often a fundamental pattern in event-driven architectures, where the flow of the program is determined by events or messages, supporting reactive programming and making the system more responsive and efficient. Data federation We’ll use Vertex AI Notebooks to clean the data and train a TensorFlow model. This model will learn the non-linear relation between customer events, products names, and prices, enabling it to calculate the optimal predicted price. Orchestrating Using Cloud Functions, we orchestrate the customer events coming from the Pub/Sub topic to be converted into tensors, which are then stored in a MongoDB Atlas collection. This collection acts as a feature store serving as a centralized repository designed to store, manage, and serve features for machine learning (ML) models. Features represent individual measurable properties or characteristics used by ML models to make predictions or decisions. MongoDB’s document model flexibility paired with the document versioning pattern will allow us to design time-sensitive chunks of events and granularly manage the training datasets for our models. Serving The Cloud Function will use the event tensor to invoke our trained model that is served in a Vertex AI endpoint. The model will provide a predicted price score that can then be inserted into our product catalog stored in MongoDB so our e-commerce application can read the price change in real time. Dynamic pricing architecture: Putting it all together In the following architecture diagram, the blue data flow illustrates how customer event data is ingested into a Pub/Sub topic. This allows us to make a push subscription to a Cloud Function from the topic. This function orchestrates the data transformation from raw event into a tensor and calls an endpoint to then update the predicted price into our MongoDB product catalog collection. By using this architectural approach, we can isolate raw events threads and build different services around them, reacting in real time for dynamic pricing or asynchronously for model training. With every component loosely coupled, we prevent the system from crashing completely. Moreover, publishers and subscribers can continue to process their logic without the need for the other components to receive or publish messages. Figure 3: Dynamic pricing architecture integrating different Google Cloud components and MongoDB Atlas as a Feature Store For businesses, this translates into more precise and responsive pricing strategies. In the model building and optimization phase, by utilizing TensorFlow within Google Cloud Vertex AI notebooks, retailers can harness the power of deep learning capabilities. The neural network model is capable of analyzing intricate patterns and relationships within large datasets. This is how businesses may capture nuanced market dynamics, customer behavior, and pricing elasticity with greater accuracy, leading to more optimized pricing decisions. But even the best of the models should be consistently optimized. Maintaining model effectiveness requires continuous adaptation. Regularly evaluating accuracy and performing feature engineering ensures your models stay sensitive to market changes. This underscores the importance of retraining as a core principle in a continuous improvement data science approach. Using MongoDB Atlas as your operational data layer means that your feature store is always accessible, reducing downtime and improving the efficiency of machine learning operations. On the other hand, cross-region deployments can bring features closer to where machine learning models are being trained or served, reducing latency and improving model performance. Get started The integration of Google Cloud and MongoDB presents an easy approach to modernizing dynamic pricing strategies. Leveraging real-time analytics, flexible data modeling, and reactive microservices architecture, it empowers businesses to achieve operational efficiencies and gain a competitive advantage in their pricing strategies. For retailers looking to elevate their pricing strategies, considering a strategic partnership with both technologies is essential. For a deeper dive into integrating the different components of this architecture, make sure to check our GitHub repository.

April 17, 2024