Tracing is a ScyllaDB tool that is meant to help developers and administrators analyze internal data flows in a ScyllaDB cluster. One example of such a flow is CQL request processing. Tracing works much as it does in Apache Cassandra, so Cassandra tracing advice that you find online should be helpful on ScyllaDB, too.
When to use CQL tracing to resolve a problem
If your replication factor is too low, or requires data transfers from a remote data center, you can see that with traces. Tracing showed exactly what kind of problem is present, and then the next step is to decide how to fix it.
Another example is network issues: If the network between two nodes is slow, the timestamps will show where the delay is happening.
Example of using tracing to identify a problem
Let’s trace a query on a cluster with 6 nodes, 3 per data center.
In this example, 127.0.0.1, .2, and .3 are in one data center, and .4, .5, and .6 are in the other. The trace shows that an entire query can be processed locally, without time-consuming cross-data-center traffic.
./bin/nodetool -p 7100 status
Datacenter: dc2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 127.0.0.4 164.73 KB 256 ? 95ff89a3-ea99-4266-bdf4-4d0a417b5232 RAC1
UN 127.0.0.5 151.16 KB 256 ? 654df685-f308-4b8c-802b-311971023dad RAC1
UN 127.0.0.6 164.53 KB 256 ? e6a4c46d-f744-4ec3-a6b9-c4852fa59030 RAC1
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 127.0.0.1 178.73 KB 256 ? 05eb380a-a73e-487f-bce2-88c791d57c2c RAC1
UN 127.0.0.2 151.15 KB 256 ? a7945022-e1c6-47e2-95fd-e2c7142c649a RAC1
UN 127.0.0.3 178.73 KB 256 ? f0be2b5c-99c7-4b89-a0c5-7412bb361739 RAC1
Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
cqlsh> create keyspace my_ks WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'dc1' : 2, 'dc2' : 2 };
cqlsh> CREATE TABLE my_ks.my_cf1 (c1 text PRIMARY KEY, c2 text);
cqlsh> INSERT into my_ks.my_cf1 ("c1", "c2") VALUES ('key1', 'value1');
cqlsh> TRACING ON
Now Tracing is enabled
cqlsh> CONSISTENCY LOCAL_QUORUM ;
Consistency level set to LOCAL_QUORUM.
cqlsh> SELECT * from my_ks.my_cf1 where c1='key1';
c1 | c2
------+--------
key1 | value1
(1 rows)
Tracing session: e8adc0b0-58a8-11e6-a3d6-000000000000
activity | timestamp | source | source_elapsed
-------------------------------------------------------------------------+----------------------------+----------+----------------
Execute CQL3 query | 2016-08-02 15:01:56.283000 | 127.0.01 | 0
Parsing a statement [shard 0] | 2016-08-02 15:01:56.283494 | 127.0.0.1 | --
Processing a statement [shard 0] | 2016-08-02 15:01:56.283540 | 127.0.0.1 | 47
Creating read executor ... repair decision: NONE [shard 0] | 2016-08-02 15:01:56.283591 | 127.0.0.1 | 97
sending a message to /127.0.0.3 [shard 0] | 2016-08-02 15:01:56.283605 | 127.0.0.1 | 112
read_data: querying locally [shard 0] | 2016-08-02 15:01:56.283650 | 127.0.0.1 | 156
read_digest: message received from /127.0.0.1 [shard 0] | 2016-08-02 15:01:56.283768 | 127.0.0.3 | 10
read_digest handling is done, sending a response to /127.0.0.1 [shard 0] | 2016-08-02 15:01:56.283829 | 127.0.0.3 | 70
read_digest: got response from /127.0.0.3 [shard 0] | 2016-08-02 15:01:56.283987 | 127.0.0.1 | 494
Done processing - preparing a result [shard 0] | 2016-08-02 15:01:56.284036 | 127.0.0.1 | 542
Request complete | 2016-08-02 15:01:56.283557 | 127.0.0.1 | 557
(output trimmed at …)
For reads, we use LOCAL_QUORUM where possible, to avoid going to the remote datacenter. But now try it with the consistency level set to QUORUM..
cqlsh> CONSISTENCY QUORUM ;
Consistency level set to QUORUM.
cqlsh> SELECT * from my_ks.my_cf1 where c1='key1';
c1 | c2
------+--------
key1 | value1
(1 rows)
Tracing session: ef573a40-58a8-11e6-a3d6-000000000000
activity | timestamp | source | source_elapsed
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
Execute CQL3 query | 2016-08-02 15:02:07.460000 | 127.0.0.1 | 0
Parsing a statement [shard 0] | 2016-08-02 15:02:07.460326 | 127.0.0.1 | --
Processing a statement [shard 0] | 2016-08-02 15:02:07.460373 | 127.0.0.1 | 47
Creating read executor ... repair decision: NONE [shard 0] | 2016-08-02 15:02:07.460427 | 127.0.0.1 | 101
read_digest: sending a message to /127.0.0.3 [shard 0] | 2016-08-02 15:02:07.460433 | 127.0.0.1 | 107
read_digest: sending a message to /127.0.0.5 [shard 0] | 2016-08-02 15:02:07.460452 | 127.0.0.1 | 126
read_data: querying locally [shard 0] | 2016-08-02 15:02:07.460455 | 127.0.0.1 | 130
read_digest: message received from /127.0.0.1 [shard 0] | 2016-08-02 15:02:07.460579 | 127.0.0.3 | 11
read_digest: message received from /127.0.0.1 [shard 0] | 2016-08-02 15:02:07.460595 | 127.0.0.5 | 8
read_digest handling is done, sending a response to /127.0.0.1 [shard 0] | 2016-08-02 15:02:07.460644 | 127.0.0.3 | 75
read_digest handling is done, sending a response to /127.0.0.1 [shard 0] | 2016-08-02 15:02:07.460647 | 127.0.0.5 | 60
read_digest: got response from /127.0.0.5 [shard 0] | 2016-08-02 15:02:07.460813 | 127.0.0.1 | 487
read_digest: got response from /127.0.0.3 [shard 0] | 2016-08-02 15:02:07.460840 | 127.0.0.1 | 515
Done processing - preparing a result [shard 0] | 2016-08-02 15:02:07.460874 | 127.0.0.1 | 548
Request complete | 2016-08-02 15:02:07.460573 | 127.0.0.1 | 573
(output trimmed at …)
Trace information shows traffic to the other data center. This was an artificial example to show tracing, but real-world ScyllaDB users run into similar problems. If a client driver is not configured properly, it will show excessive node-to-node (or worse, DC-to-DC) messaging. When we saw this client-side problem in production, the solution was to fix the client.
Probabilistic tracing
Tracing implies a significant performance penalty on a cluster when it’s enabled. Therefore if we want to enable tracing for some ongoing workload (with many similar queries) we don’t want to enable it for every request, but rather for some (small) portion of them. This may be achieved using “probabilistic tracing”, which randomly chooses a request to be traced with some defined probability.
For instance, if we want to trace 0.01% or all queries in the cluster we set a probabilistic tracing
with the probability 0.0001:
nodetool settraceprobability 0.0001
Cleaning up trace data
Trace data persists for 24 hours by default. If you want to clean it up sooner, you can truncate the system_traces tables using the
TRUNCATE
command:
TRUNCATE system_traces.sessions;
and
TRUNCATE system_traces.events;
Where tracing data goes
CQL tracing can generate large quantities of data, because every event on every node of the cluster can be logged. The good news is that ScyllaDB can handle huge inserts of data. It’s inserted into the system_traces
keyspace, on your ScyllaDB cluster.
If you enable tracing, and you run a query there will be at least one tracepoint at the beginning and at the end. Most likely there will be events in the middle. Any queries that require communication with different nodes will hit tracepoints about messages being sent elsewhere, and generate events. Traces are generated on the node, if anything happens in the context of a query. And no trace information is sent from node to node – it is just inserted into the system_traces
keyspace.
The system_traces
keyspace has two tables, and a default replication factor of 2.
CREATE TABLE system_traces.events (
session_id uuid,
event_id timeuuid,
activity text,
source inet,
source_elapsed int,
thread text,
PRIMARY KEY (session_id, event_id)
)
CREATE TABLE system_traces.sessions (
session_id uuid PRIMARY KEY,
client inet,
command text,
coordinator inet,
duration int,
parameters map<text, text>,
request text,
started_at timestamp
)
Traces are created in a context of a tracing session
. For instance, if we trace an INSERT
CQL command, a tracing session with a unique ID (session_id
column in the tables above) will be created and all trace points hit during the execution will be stored in a context of this session. And this defines the format in which tracing data is stored:
- The
sessions
table contains a single row for each tracing session - The
events
table contains a single row for each trace point.
A tracing session always corresponds to a single CQL query.
The session ID of the query can be used to query the keyspace to get all tracepoints for that query, created by all replica nodes. The format of those tables is the same as in Apache Cassandra. With new versions of ScyllaDB, more traceable events are being added.
The events
table has multiple rows for a session, and event id is clustering key for that session. You can sort by event_id to get all events for a query, from all nodes, in chronological order.
Each row of events
has a column called thread, which contains the shard numbe—you can see on which shard an event happened. (ScyllaDB nodes are sharded and each shard can handle its portion of a complex query independently while other shards are working.)
Data stored for each event
The events
table’s columns are quite straightforward. The most interesting among them are:
- session_id: ID of a session this trace
- event_id: ID of this specific trace entry
- activity: a trace message
- source: address of a ScyllaDB node where the trace entry has been created
- source_elapsed: a number of microseconds passed since the beginning of the tracing session on a specific Node (see examples above)
- thread: currently this contains a number of a shard on which this trace point has been taken
Data stored for each session
The columns of sessions
are:
- session_id: ID of this tracing session
- command: currently this may only have a “QUERY” value
- client: address of a Client that has sent this query
- coordinator: address of a coordinator that received this query from a Client
- duration: the total duration of this tracing session
- parameters: this map contains string pairs that describe the query which may include query string and consistency level.
- request: a short string describing the current query, like “Execute CQL3 query”
- started_at: is a timestamp taken when tracing session begins
Tracing at ScyllaDB Summit
Come to ScyllaDB Summit in San Jose on September 6th, to learn more about CQL tracing and other valuable info for ScyllaDB developers and administrators—along with success stories from companies like IBM, Outbrain, Samsung SDS, Appnexus, Hulu, and Mogujie, all using ScyllaDB for better performance and faster development. Meet ScyllaDB developers and devops users who will cover ScyllaDB design, best practices, advanced tooling and future roadmap items.
Going to Cassandra Summit? Add another day of NoSQL, right next door! ScyllaDB Summit takes place the day before Cassandra Summit begins and takes place at the Hilton San Jose, adjacent to the San Jose Convention Center. Lunch and refreshments are provided.
Photo of a bloodhound puppy by Asilverstein (Own work) CC BY-SA 3.0, via Wikimedia Commons.