We’ve talked in the past about how ScyllaDB helps you find large partitions. But sometimes you need to get even more granular to get to the heart of what might be causing a hiccup in your database performance. Below we describe how to detect large rows and large cells in ScyllaDB.
While it tries its best to handle them, ScyllaDB is not optimized for very large rows or large cells. They require allocation of large, contiguous memory areas and therefore may increase latency. Rows may also grow over time. For example, many insert operations may add elements to the same collection, or a large blob can be inserted in a single operation.
Similar to the large partitions table, the large rows and large cells tables are updated when SSTables are written or deleted, for example, on memtable flush or during compaction. We added the means to search for large rows and large cells when we added the SSTable “mc” format with ScyllaDB 3.0 and ScyllaDB Enterprise 2019.1. This SSTable format is enabled by default on ScyllaDB Open Source 3.1 and above.
Find Large Rows
For example, look at an example of the system.large_rows
table:
Let’s break down what each of these columns and values represent:
Parameter | Description |
keyspace_name |
The keyspace name that holds the large partition |
table_name |
The table name that holds the large partition |
sstable_name |
The SSTable name that holds the large partition |
row_size |
The size of the row |
clustering_key |
The clustering key that holds the large row |
compaction_time |
Time when compaction occur |
Next, let’s look at a simple CQL query for a specific keyspace and or table within all the large rows. For example if we were looking for the keyspace demodb
and table tmcr
:
SELECT * FROM system.large_rows WHERE keyspace_name = 'demodb' AND table_name = 'tmcr;
Find Large Cells
To find large cells, let’s look at the system.large_cells
table:
And similarly, let’s understand the specific parameters of this table:
Parameter | Description |
keyspace_name |
The keyspace name that holds the large partition |
table_name |
The table name that holds the large partition |
sstable_name |
The SSTable name that holds the large partition |
row_size |
The size of the row |
clustering_key |
The clustering key that holds the large row |
column_name |
The column of the large cell |
compaction_time |
Time when compaction occur |
The main difference between the large row and large cell tables is the addition of the column_name
in the latter.
For example, if we were looking for the keyspace demodb
and table tmcr
, use this CQL query:
SELECT * FROM system.large_cells WHERE keyspace_name = 'demodb' AND table_name = 'tmcr;
Configure
Configuration of the large row and cell detection threshold in the scylla.yaml file uses the following parameters:
compaction_large_row_warning_threshold_mb parameter
(default: 10MB)compaction_large_cell_warning_threshold_mb parameter
(default: 1MB)
Once the threshold is reached, the relevant information is captured in the large_row
/ large_cell
table. In addition, a warning message is logged in the ScyllaDB log (refer to our documentation on logging).
Storing
Large rows and large cells are stored in system tables with the following schemas:
Expiring Data
In order to prevent stale data from appearing, all rows in the system.large_rows
and system.large_cells
tables are inserted with Time To Live (TTL) equal to 30 days.
Conclusion
Large rows and large cells are unfortunate but frequently found artifacts when users are first beginning data modeling with ScyllaDB. Often users don’t anticipate how their early data modeling decisions will impact performance until they go into production. That’s why we feel it is vital to put tools in the hands of users to be able to easily detect and quickly troubleshoot these data phenomena.
Have you run into problems with large partitions, rows or cells that caused you some worried hours or sleepless nights? How did you solve them? We’d love to hear your war stories. Write to us, or join our Slack channel to tell us all about it.
In the meanwhile, if you want to improve your skills with ScyllaDB, make sure you take our ScyllaDB University course on data modeling, with sections for both beginners and advanced users. It’s completely free!
LEARN MORE AT SCYLLA UNIVERSITY