In my previous blog post, I wrote about the top students for 2020, the ScyllaDB Summit Training Day, getting course completion certificates, and other news. In this blog post I’ll talk about new lessons added to ScyllaDB University since our June 2020 update.
New Features
- New CDC lesson: Change Data Capture, or CDC, is a feature that allows you to query recent changes made to data in specified tables. CDC enables users to build streaming data pipelines that enable real-time data processing and analysis and immediately react to modifications occurring in the database. Some of the topics covered in this lesson are:
- An overview of Change Data Capture, what exactly is it, what are some common use cases, what does it do, and an overview of how it works
How can that data be consumed? Different options for consuming the data changes including normal CQL, a layered approach, and integrators - How does CDC work under the hood? Covers an example of what happens in the DB on different operations to allow CDC
- A summary of CDC: It’s easy to integrate and consume, it uses plain CQL tables, it’s robust, it’s replicated in the same way as regular data, it has a reasonable overhead, it does not overflow if the consumer fails to act and data is TTL’ed. The summary also includes a comparison with other NoSQL CDC implementations in Cassandra, DynamoDB, and MongoDB.
- An overview of Change Data Capture, what exactly is it, what are some common use cases, what does it do, and an overview of how it works
- Workload Prioritization: using Service Level CQL commands database administrators working on ScyllaDB Enterprise can set different workload prioritization (level of service) for each of these workloads without sacrificing latency or throughput. Each service level can also be attached to your organization’s various roles, ensuring that each role is granted the level of service they require.
- Materialized Views + Secondary Indexes: Includes nine topics, three quizzes, MV lab, and SI lab. In ScyllaDB (and Apache Cassandra), data is divided into partitions, rows, and values, which can be identified by a partition key. Sometimes the application needs to find a value by the value of another column. Doing this efficiently without scanning all of the partitions requires indexing, the focus of this lesson. There are three indexing options available in ScyllaDB: Materialized Views, Global Secondary Indexes, and Local Secondary Indexes. They are all covered in this lesson, along with comparing them, examples of when to use each, quizzes, and hands-on labs.
- New LWT lesson: There are cases when it is necessary to modify data based on its current state: that is, to perform an update that is executed only if a row does not exist or contains a particular value. Lightweight Transactions (LWTs) provide this functionality by only allowing data changes to occur if the condition provided evaluates as true. The conditional statements provide linearizable semantics, thus allowing data to remain consistent. A basic rule of thumb is that any statement with an IF clause is a conditional statement. A batch that has at least one conditional statement is a conditional batch. Conditional statements and conditional batches are executed atomically as a LWT. This lesson provides an overview of LWT, an example of how it’s used, and a comparison with Apache Cassandra.
Language Lessons
- CPP Part 2: Using Prepared Statements with the CPP (C++) Driver to connect to a ScyllaDB cluster and perform queries.
- Scala lesson part one and part two: How to use the Phantom Scala driver to create a sample Scala application that executes a few basic CQL operations with a ScyllaDB cluster using the Phantom Scala driver.
Operations and Management
- Security: This lesson covers security features and the way that ScyllaDB handles security. By the end of this lesson, you’ll understand why security is essential in ScyllaDB, the different security features, and how it works. Some of the topics covered in this lesson are:
- Why is it important to secure your data? Business value is increasingly tied to data. Security properties such as Identity, Authentication, Confidentiality, Availability, Integrity, and Non-repudiation
How to manage identities with users? Identity, Authentication, Users and passwords, and Availability - What is Authentication, and how it limits access to the cluster to identified clients? Authentication is the process where login accounts and their passwords are verified, and the user is allowed access to the database.
- Users and passwords are created with roles using a GRANT statement. This procedure enables Authentication on the ScyllaDB servers. However, once complete, all clients (application using ScyllaDB/Apache Cassandra drivers) will stop working until they are updated to work with Authentication as well.
- The concepts of roles and permissions, Confidentiality, Non-repudiation
- What is Authorization? How are users granted permissions which entitle them to access or change data on specific keyspaces, tables, or an entire data center? Role-Based Access Control reduces lists of authorized users to a few roles assigned to multiple users. It also includes an example.
- Encryption In Transit, which is: Client to Node, Node to Node, and an overview of Encryption At Rest, which includes data stored in Tables, System, and Providers.
- Encryption at Rest, or how to encrypt user data as stored on disk? This is invisible to the client and available on ScyllaDB Enterprise. It uses disk block encryption and has a minimal impact on performance.
- Auditing enables us to know who did/looked at / changed what and when by logging activities a user performs on the ScyllaDB cluster.
- The importance of ensuring that ScyllaDB runs in a trusted network environment, limiting access to IP / Port by role, using minimal privileges principle, avoiding Public IP if possible, and using VPC if possible. Security is an ongoing process. Ensure that you routinely upgrade to the latest ScyllaDB and OS versions, routinely check for network exposure, routinely replace keys/passwords, use 2FA (ScyllaDB Cloud: NoSQL DBaaS), and use minimal privilege principle, apply available security features.
- Why is it important to secure your data? Business value is increasingly tied to data. Security properties such as Identity, Authentication, Confidentiality, Availability, Integrity, and Non-repudiation
- Using Spark with ScyllaDB: by using Spark together with ScyllaDB, users can deploy analytics workloads on the information stored in the transactional system. This lesson goes over an overview of ScyllaDB, Spark, and how they can work together.
- Configuration and Where to Run ScyllaDB: covers ScyllaDB configuration and setup and best practices. By the end of this lesson, you’ll better understand ScyllaDB installation and configuration’s practical aspects.
- How to Write Better Apps: In this lesson, you’ll learn how to write better applications. This is an intermediate to advanced level lesson. By the end of this lesson, you’ll have a better understanding of application development, caveats, performance, and what you should or shouldn’t do.
- ScyllaDB Drivers: Three new lessons were added to the course. They cover an overview of the ScyllaDB Token Ring architecture, ScyllaDB specific (shard-aware) drivers, why it is important to use them, and what is paging and shard awareness.
- Alternator: The course is updated with new lessons and quizzes, covering Alternator, ScyllaDB’s DynamoDB-compatible API, in action and how it works, implementation details, and a hands-on lab to help you get started.
- Manager Repair Tombstones: This lesson deals with what repair is and why it is needed. What are Tombstones, why it is important, and ScyllaDB Manager. ScyllaDB Manager is a centralized cluster administration and recurrent tasks automation tool. ScyllaDB Manager can schedule tasks such as repairs and backups. ScyllaDB Repair is a process that runs in the background and synchronizes the data between nodes so that eventually, all the replicas hold the same data. Data stored on nodes can become inconsistent with other replicas over time, which is why repairs are a necessary part of database maintenance. Using ScyllaDB repair makes data on the node consistent with the other nodes in the cluster. The best use of ScyllaDB repair is to have the ScyllaDB Manager schedule and run the repairs for you.
- Admin Procedures and Tools: In this lesson, you’ll learn how to administer a ScyllaDB cluster. It covers essential tools and procedures, best practices, common pitfalls, and tips for successfully running a cluster.
- New ScyllaDB Monitoring lesson: It’s extensive, and it includes nine topics, one lab, and two quizzes. This lesson covers ScyllaDB Monitoring. ScyllaDB Monitoring is a full-stack for monitoring a ScyllaDB cluster and for alerting. The stack contains open source tools, including Prometheus and Grafana, and custom ScyllaDB dashboards and tooling.
Next Steps
That’s a lot of new content for you to check out. ScyllaDB University now covers most ScyllaDB-related topics for Developers, DBAs, and Architects.
Visit ScyllaDB University and start learning. It’s free, online, and self-paced!