Building Fast, Scalable Database Apps With Ariel Weisberg

On Tuesday, November 20, 2012, I attended an event held at AppNexus: Building Fast Scalable Database Apps the Easy Way with Ariel Weisberg, Software Engineer for VoltDB. At the event, Weisberg explained and demonstrated the features of VoltDB as well as the upcoming changes in VoltDB 3.0.

VoltDB is a “high velocity open source ACID (atomicity, consistency, isolation, durability) compliant database that is used in finance, advertising, gaming and network monitoring to process massive transactional workloads and drive real-time analytics,” Ariel Weisberg began. “VoltDB to this day is working on building high velocity database apps to make it faster and more reliable.”

Weisberg asked the audience if they’ve ever faced a problem like I/O bottlenecks. A majority of the audience raised their hands. “Random I/O reads are inconsistent,” he said. “Flash SSIDs are expensive and also inconsistent.” He also described I/O bottlenecks to add caching app tiers like memcached, split read/write workloads, offload reads to replicas and scaling-wise, the entire process would reach a point of diminishing returns because it comes to a point where the system would require more memory (RAM) and drives, becoming expensive. Weisberg also explained that to prevent I/O bottlenecks a system would require bigger hard drives, which end up becoming expensive, adding IT operators, which add more to your expense, and require more database licenses, which are expensive. Regarding the data model to counter I/O bottlenecks, the system would have to switch to the doc store and K/V store or other forms, give up relational model and analytics/reporting. For transactions, the system would have to give up ACID and batch comments. App-controlled sharding would increase app complexity and add maintenance cost.

According to Weisberg, VoltDB helps solve I/O bottlenecks. “Lose the bath water, but keep the baby, or as I always say, ‘Lose the gun, but keep the cannoli,'” Weisberg said. VoltDB solves in-memory operations to eliminate I/O bottlenecks, serial executions to avoid latching, locking and deadlocks. It linearly scales out on a commodity hard drive, lower cost per transaction and is open source (or proves commercial licensing). VoltDB’s data model is relational and uses SQL as its declarative language, this way, “it allows us to pick the most relational program.” On transactions, VoltDB uses ACID/serializable isolations. It is fully consistent and durable and manages sharding and can handle cross-partition queues with ease.

High velocity databases handle lots of independent events that are happening at a very high frequency. It requires everything from updating state, deciding transactions and staying up in the face of failures, automatically handling failures, supporting complex manipulations of state per event, real-time analytics and easy integration with high volume analytic data stores, as well as raw enriched or sampled data in migrated to companion stores.

“VoltDB’s sweet spot,” Weisberg said, “is its ability to support many complex transactions per second. It can maintain materialized views. It isn’t a reformulation of disk-based concepts and it’s more than the sum of its parts.” Weisberg then gave a short background of how VoltDB evolved to its current state. “Its origins began as an H-Store developed at MIT, Brown, Yale to eliminate the overhead of traditional RDMBS’s. In-memory eliminates disk I/O waits and uses serialized execution to avoid latching or locking. It was built to scale across commodity servers and has a high throughput on a small number of nodes and linear scalability as cluster grows.”

Weisberg discussed partitioning in VoltDB and first explained that for partitioning, there must be tables, a large data set, spread data across clusters, client apps and replicated tables (small footprint and replicated on all nodes, locality of data). A VoltDB partition contains data and execution engine. “The execution engine contains input queues for transaction requests,” Weisberg said. “The requests run to completion, serially at each partition.”

VoltDB transactions are simple SQL statements or stopped procedure invocation. It also uses Java stopped procedures — Java statements with embedded parameterized SQL, which move code to data, not the other way around.

The transaction execution can be “a single partition transaction and all data is in one partition, but all operates autonomously,” Weisberg said. “There are also multi-partition transactions where one partition distributes and coordinates work.” He listed the SQL support on VoltDB: Select INSERT (using UPDATE & DELETE), Aggregate SQL, SQL LIKE, Materialized views using COUNT and SUM, SQL column functions, Index on function expressions (ISON), and Focus on transactional SQL.2012-11-20 appnexus

Weisberg outlined the durability of VoltDB, stating that its synchronous lossing provides the highest durability possibly at reduced performances. “Asynchronous lossing,” he said, “gives the best performance at reduced durability.” On VoltDB, “transactions are shared on multiple threads,” he said and moved on to VoltDB 3.0, which features low latency, ease-of-use programming, reduced need to NTP and it is more cloud friendly. VoltDB 3.0 also features fast ad hoc SQL, more SQL — notably indexing on column functions (JSON), on-line schema changes, high velocity export and developer reach.

Weisberg, to conclude the session put it all back together in regards to scalability. “VoltDB has in-memory transaction. It has a low cost per transaction and uses a relational data model with ACID transactions.”

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s