MongoDB Days 2014 is a seven-date, four-country tour to highlight existing capabilities, upcoming features, and early adopter case studies for this supplier's database technology. ARC Advisory Group recently attended the Boston event. MongoDB is an example of an emerging class of database broadly known as NoSQL ("Not-only SQL") databases. NoSQL databases use different storage structures than the row-column schema used by relational databases and are typically designed to be distributed, horizontally scalable, and highly available.
MongoDB, Inc. has raised over $230 million in venture funding and has a number of strategic partnerships. One of these is with Bosch Software Innovations, which sees MongoDB as central to its Industrial Internet of Things (IIoT) strategy. Roughly 300 people attended the event in Boston, mostly systems architects and developers. As a result, many of the sessions throughout the day were quite technical and engineering focused. However, a number of customers also presented, including a number of mature, established companies.
Key takeaways from the event include:
- MongoDB is experiencing strong adoption with MetLife – number 42 on the Fortune 500 – demonstrating that adoption isn't limited to smaller organizations, startups or non-mainstream use cases.
- There are two pervasive reasons for adopting MongoDB: to gain a more flexible approach to data storage; and to achieve a relatively easy, low-cost path to high availability for mission-critical applications.
What Is MongoDB, and Where Did It Come From?
Relational database management systems (RDBMS) have grown to dominate the storage market for enterprise applications over the last 30 years. But, relational databases were designed in the era of the enterprise server – a self-contained computer system with CPU, RAM and disk in a single box. As such, they were really only able to scale performance vertically, by adding more or better hardware within that single box. The emergence of internet-based business models rapidly exposed the limitations of this approach. In addition, internet pioneers also discovered that data structures consisting of tables with rows and columns were a poor fit for many emerging applications. For example, data representing website traffic is semi-structured and so does not fit neatly into rows and columns. In addition, the rigidity of the relational schema, once it had been defined, hampered agility in the application development cycle.
At that time, there was a distinct lack of cost-effective, commercially available solutions for use by companies like Yahoo, Amazon, and Google, which built their own scalable data infrastructure technologies. Some, such as Hadoop and Cassandra, later became publically available. In a similar way, MongoDB started life as an internal project at 10gen to meet the need for a database to underpin a commercial cloud infrastructure service. However, 10gen's founders quickly realized the potential of the database. A strategic pivot followed, along with renaming the company to MongoDB, Inc. in August 2013. Now, MongoDB is open source, available either as a free download or a more capable commercially licensed product. The database is horizontally scalable via "sharding" (similar to partitioning in relational databases), and provides high-availability through replication. Equally important, MongoDB is a type of NoSQL database that is known as a document store, and does not have tables organized by rows and columns as a more traditional relational database does. (The term document store is unfortunate. It doesn't mean that MongoDB stores traditional office documents, such as presentations and spreadsheets. It simply means that data is stored in a more flexible way that can accommodate a variety of data structures.)
Who Uses MongoDB, and Why?
At this point, the company claims that over 1,000 organizations use the commercial MongoDB products, including 30 of the Fortune 100. MongoDB Days 2014 in Boston featured case studies (and speakers) from the following organizations:
The Broad Institute
The Broad Institute is a biomedical and genomic research center located in Cambridge, Massachusetts. Corey Flynn, a bioinformatics scientist, presented the organization's story (the squeamish might want to skip slide 2). In part, the research is aimed at moving towards personalized medicine by matching specific drugs and medications more closely to the genetics of an individual. MongoDB helps them do that by storing the results of 1.4 million experiments and over 12,000 compounds. The flexibility of MongoDB has been critical, as their database is frequently re-factored as the application is enhanced. Overall, the solution has improved the ability of researchers to predict drug function, to find novel drug targets, and to repurpose drugs that were unsuccessful in other treatment scenarios.
CARFAX is a web-based provider of vehicle history information used by millions of consumers and car dealers every year. Prior to adopting MongoDB, CARFAX used an in-house-developed, key-value store first written in 1984. The company currently has 13.6 billion records on vehicles and receives information from more than 34,000 different data sources. Notably, the presenter (Jai Hirsch – slides here) described this as a medium-size data problem, but a complex data problem. Significantly, MongoDB's ability to manage this complex data led to its selection during the technology evaluation phase of the project. That is, the legacy document structure mapped well to the MongoDB document structure. In addition, the product was suitable for working with sparse data – records that may have hundreds of fields but with only a few populated in each record. And MongoDB's built-in high availability was critical for a "bet-the-business" application. The production environment consists of 108 servers, a database of 10.6TB and services queries in 200ms.
Jai Hirsch offered a number of suggestions which he felt were critical to the success of the project, including:
- Make sure you understand the characteristics of your data first, as this should guide your choice of database.
- Make sure you architect at the outset for large data volumes.
- Automate as much as you can.
- Hire the best – the project was completed by four people in 18 months.
MetLife, a global provider of insurance, annuities and employee benefits, has approximately 100 million customers and 65,000 employees. MongoDB now lies at the heart of a mission-critical application known as "The Wall." This application provides a 360-degree view of the customer for customer service representatives spanning 45 million agreements with 140 million transactions.
Greg Novikov, a database specialist at MetLife presented an operational perspective of day-to-day life with MongoDB supporting this key application. (Greg's slides are here.) The company's clearly defined standards for applications are important for operating the business. For example, the loss of a single data center for an indeterminate time should not compromise the application. In addition, all data centers had to be on premise for compliance reasons. MetLife also anticipated growing data volumes. These requirements drove a number of systems architecture decisions:
- The application had to be geographically dispersed
- Six nodes per data center
- Data was sharded from the start
- Reads are mostly "secondary preferred." MongoDB has the concept of a replica set which is a master-slave replication setup to support availability. Writes are always made to the primary (master) in a replica set, but the database can be configured to route read requests to the secondary (slave). This can help to improve read throughput, but at the risk of reading dirty data. However, with an application of this nature, where data changes slowly over time, that risk is minimized and considered a worthwhile performance tradeoff.
While The Wall is live and performing well, there are some areas in which MongoDB is relatively less mature than others that have been around for many years. For example, MetLife found the solution somewhat lacking in security features, with weak password protection and third-party products used for data encryption and some auditing functions. Similarly, Greg felt that automating administrative functions and also workload management could be improved.
What's Up-and-Coming in MongoDB?
Eliot Horowitz, CTO at MongoDB, provided some highlights from the MongoDB roadmap. To start, Eliot talked about three design principles that direct everything the research and development team does:
- Don't compromise on developer productivity.
- Every feature that gets added to the release must scale horizontally.
- MongoDB must be easy to operate and maintain at scale while running.
The upcoming release (2.8) adds capabilities in three main areas over the current release (2.6):
- Improving concurrency. Currently, MongoDB uses database-level locking. While this allows many concurrent readers of the database, it only allows one process to write to the database at a time. This is a potential show-stopper for write-intensive applications. In the 2.8 release, document (i.e., record) level locking will be introduced to allow much higher levels of write activity to the database.
- Pluggable storage engine. Currently MongoDB uses a hard-wired, general-purpose storage engine that works well for a broad range of applications. This is common in the proprietary relational database world. Logically though, database software can be split into two halves: The query language for users and interfaces to applications; and management of data storage and retrieval. By creating an Application Programming Interface (API) that defines how the two halves connect, it's possible to make the two parts interchangeable. Other open source database products (e.g. MySQL) have taken this approach, and MongoDB is following later this year. In this way, the company aims to provide the flexibility to optimize the storage engine for a wide range of use cases.
- Automation. MongoDB clusters can scale to hundreds or thousands of nodes. Managing clusters manually at this scale is daunting at best. The Mongo Management Service (MMS) is a cloud-based tool designed to automate this as much as possible. For example, it provides wizards to guide administrators through configuration steps. Automation can then be used to replicate those steps across multiple nodes. Using MMS, administrators will be able to provision, scale, and upgrade clusters, as well as perform backup and restore operations, and receive performance alerts.
Relational database systems aren't going away any time soon. They underpin many of the business world's mission-critical applications reliably and invisibly. However, the emergence of the internet-based business model ushered in a new class of application – applications that needed to ingest large volumes from new data sources, 24x7x365.
As is always the case, changing business needs present opportunities for new and emerging technologies. Today, enterprises must be agile and always "open for business." Consequently, low-cost, easy to administer high availability is becoming more important. Likewise, the flexibility to deal with different data types and morph to support different data structures is growing in importance too.
At this point, MongoDB is a bit of a rough diamond. The product enables cost-effective high availability and data flexibility. Organizations that need those characteristics might want to evaluate MongoDB (free download). Since some desirable enterprise-class capabilities, such as security and easy administration for large-scale deployments are less mature, any evaluation must consider compliance with corporate policies and standards.
All signed-in ARC Advisory Group clients can view this report in pdf format at this Link
If you would like to buy this report or obtain information about how to become a client, please Request ARC Info
Keywords: MongoDB, Database, Big Data, NoSQL, IIoT.