Analytics Show Time: Presto Powered by ScyllaDB

By Tzach Livyatan

September 28, 2016

Presto is a popular, open source, distributed ANSI SQL query engine. It is used to run ad-hoc, interactive analytic queries on many data sources including HDFS, S3, Cassandra, MySQL, Kafka, PostgreSQL, Redis and ScyllaDB.

Unlike Apache Hive, Presto is not a layer on top of Map Reduce (Hadoop). It was designed from scratch to execute SQL queries, on data sizes ranging from gigabytes to petabytes.

In particular, Presto is attractive for organizations who have multiple databases and are interested in running queries, including SQL JOIN, across more than one of them at a time. It has a simple install, which makes it easy to get started with (no ZooKeeper, thank you)

Presto connects to ScyllaDB using the same connector as Cassandra (hooray for driver compatibility) allowing you to take advantage of ScyllaDB’s superior throughput and latency.

I recently gave a short introduction to ScyllaDB and Presto at ScyllaDB Summit 2016. slides and video below.
(I’m the guy with the ScyllaDB t-shirt)

<br />

ScyllaDB Summit 2016: Analytics Show Time, Presto – slides

from

ScyllaDB

ScyllaDB Summit 2016: Analytics Show Time – Spark and Presto Powered by ScyllaDB

from

ScyllaDB Summit 2016

Try it yourself

Want to give Presto and ScyllaDB a spin? There is a Docker image just for that:

Run

sudo docker run --name some-scylla-presto -d tzachl/scylla-and-presto-image

Provision ScyllaDB with CQLSh

$ sudo docker exec -it some-scylla-presto cqlsh

cqlsh> CREATE KEYSPACE mykeyspace WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
use mykeyspace ;
CREATE TABLE air_quality_data (
    sensor_id text,
    time timestamp,
    co_ppm int,
    PRIMARY KEY (sensor_id, time)
);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:00', 17);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:01', 18);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:02', 19);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:03', 20);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:04', 30);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:04', 31);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:10', 20);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:00', 200);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:01', 201);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:02', 201);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:03', 401);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:04', 402);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:10', 1000);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:11', 2000);
exit;

Run Presto CLI

$ sudo docker exec -it some-scylla-presto ./presto --server localhost:8080 --catalog cassandra --schema default

presto:default> select sensor_id, avg(co_ppm) as AVG from cassandra.mykeyspace.air_quality_data group by sensor_id;

 sensor_id |       avg
-----------+--------------------
 your_home |  629.2857142857143 
 my_home   | 20.833333333333332 
(2 rows)

Any questions about ScyllaDB, Presto or ScyllaDB with Presto? Join the discussion at the ScyllaDB user group, get our blog RSS feed, or follow @ScyllaDB on Twitter.

Previous Post Next Post

Apache® and Apache Cassandra® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Amazon DynamoDB® and Dynamo Accelerator® are trademarks of Amazon.com, Inc. No endorsements by The Apache Software Foundation or Amazon.com, Inc. are implied by the use of these marks.

Why ScyllaDB?

Is ScyllaDB right for me?

ScyllaDB University

ScyllaDB Blog

Analytics Show Time: Presto Powered by ScyllaDB

Try it yourself

Run

Provision ScyllaDB with CQLSh

Run Presto CLI

Start scaling with the world's best high performance NoSQL database.

Why ScyllaDB?

Is ScyllaDB right for me?

ScyllaDB University

ScyllaDB Blog

Analytics Show Time: Presto Powered by ScyllaDB

Try it yourself

Run

Provision ScyllaDB with CQLSh

Run Presto CLI

Related Posts

Start scaling with the world's best high performance NoSQL database.

Subscribe to the ScyllaDB Blog