Columnar Database FAQs
What is a Columnar Database?
Columnar Databases organize and store data by columns rather than rows. They optimize data for aggregate functions and operations on columns of data, by storing data values contiguously, with the same data type and semantic meaning.
Is columnar database just another name for a NoSQL database?
Columnar databases are one of several types of NoSQL databases. NoSQL databases include several variations from traditional row-based relational databases, including Key-Value Stores, Document Databases, Graph Databases, and Columnar or Column-Oriented Databases.
What are the advantages of a Columnar Database?
The manner in how a columnar database stores data together is functionally similar to defining an index on every column in a row-based database. For example, calculating the average price or sum of values is fast, because there are no page scans, no traversing all rows on the disk to find the prices to average. Because the data in each column is contiguous, sorting can also be performed efficiently. Columnar databases are also highly scalable and compress well.
What are the drawbacks of a Columnar Database?
Because the column data is stored together in a columnar database, a row of data is split across multiple sections. When a row is written, each column is written separately, causing slower writes and lower consistency during the write, but the data will become consistent eventually as all the writes complete. Therefore, one trade-off with columnar databases is slower write times in exchange for faster aggregate function times. In short, row operations are slower, but column operations are faster.
Use Cases for Columnar Databases
Columnar databases are well suited for big data processing, business intelligence (BI), and analytics. IoT (Internet of Things) devices feed high volumes of data into data stores. IoT records may have limited data elements with new records arriving in a steady stream, so a big data columnar database would be ideal for analyzing the data. Cellular telecommunications, for example, relies on constant communication between the handset and the towers. Telecom providers need to analyze that data in near real time in order to detect problems with a tower.
Columnar Database Examples
Apache Cassandra led the way with columnar databases. Many other vendors have followed the Cassandra model with their own unique variations, while also implementing CQL (Cassandra Query Language). Columnar databases that use CQL include Apache Cassandra, DataStax, Microsoft Azure Cosmos DB, and ScyllaDB, which is a native C++ rewrite of Cassandra.
Other databases, such as Apache HBase, use their own query language. Apache Kudu was built to run on Hadoop, while MariaDB was forked from the MySQL RDBMS and supports a subset of SQL commands.
Does ScyllaDB Offer Solutions for Operational Databases?
NoSQL-based operational database systems can truly harness the power of big data by using technologies such as ScyllaDB, which is a drop-in replacement for Apache Cassandra that provides built-in schedulers, a native memory allocator, automatic configuration capabilities, high scalability, and support of global and local indexes. These features enable orders-of- magnitude increase in database performance, increased throughput and storage capacity, reduced hardware costs, and simplified usability and maintainability.