Presto is a popular, open source, distributed ANSI SQL query engine. It is used to run ad-hoc, interactive analytic queries on many data sources including HDFS, S3, Cassandra, MySQL, Kafka, PostgreSQL, Redis and ScyllaDB.
Unlike Apache Hive, Presto is not a layer on top of Map Reduce (Hadoop). It was designed from scratch to execute SQL queries, on data sizes ranging from gigabytes to petabytes.
In particular, Presto is attractive for organizations who have multiple databases and are interested in running queries, including SQL JOIN, across more than one of them at a time. It has a simple install, which makes it easy to get started with (no ZooKeeper, thank you)
Presto connects to ScyllaDB using the same connector as Cassandra (hooray for driver compatibility) allowing you to take advantage of ScyllaDB’s superior throughput and latency.
I recently gave a short introduction to ScyllaDB and Presto at ScyllaDB Summit 2016. slides and video below.
(I’m the guy with the ScyllaDB t-shirt)
ScyllaDB Summit 2016: Analytics Show Time – Spark and Presto Powered by ScyllaDB
from
ScyllaDB Summit 2016
Try it yourself
Want to give Presto and ScyllaDB a spin? There is a Docker image just for that:
Run
sudo docker run --name some-scylla-presto -d tzachl/scylla-and-presto-image
Provision ScyllaDB with CQLSh
$ sudo docker exec -it some-scylla-presto cqlsh
cqlsh> CREATE KEYSPACE mykeyspace WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
use mykeyspace ;
CREATE TABLE air_quality_data (
sensor_id text,
time timestamp,
co_ppm int,
PRIMARY KEY (sensor_id, time)
);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:00', 17);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:01', 18);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:02', 19);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:03', 20);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:04', 30);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:04', 31);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:10', 20);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:00', 200);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:01', 201);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:02', 201);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:03', 401);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:04', 402);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:10', 1000);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:11', 2000);
exit;
Run Presto CLI
$ sudo docker exec -it some-scylla-presto ./presto --server localhost:8080 --catalog cassandra --schema default
presto:default> select sensor_id, avg(co_ppm) as AVG from cassandra.mykeyspace.air_quality_data group by sensor_id;
sensor_id | avg
-----------+--------------------
your_home | 629.2857142857143
my_home | 20.833333333333332
(2 rows)
Any questions about ScyllaDB, Presto or ScyllaDB with Presto? Join the discussion at the ScyllaDB user group, get our blog RSS feed, or follow @ScyllaDB on Twitter.