Time Series Data FAQs
What is Time Series Data?
Time-series data are observations obtained over time through repeated measurements and collected together. Expressed visually on a graph, one of the axes is always time when you plot the points in time from series data.
Time series metrics are specific data tracked at set time increments. For example, a time series metric might track how much inventory a store sells each day. A user might plot this data for a month to see when the busiest sales days were.
Because time is always an observable factor, time series data is everywhere. Sensors and systems emit time series data constantly as the world around us becomes more instrumented, and that data can be applied across numerous industries in many ways.
Time Series Data Examples
Weather records, patient health metrics, economic indicators; these are all time series data. Time series databases can hold data from sensors, analytics, or the network; application performance monitoring or server metrics; and many other sources.
As well as being captured at regular time intervals, time series data can be captured as it happens in logs—regardless of time interval. Logs are a registry of processes, events, and communication between the operating system and software applications.
To place these ideas in context, consider these examples of time series data:
Financial time series data. In the context of investing, a time series tracks the way data points, such as stock prices, move over a set period of time. Either short term tracking, such as watching the price of a security on the hour throughout the business day, or long term tracking, such as tracking that price at closing bell on the last day of the month for ten years. Sometimes it’s ideal to continuously collect both short and long-term data on how markets are changing to optimize returns, as autonomous trading algorithms do.
Healthcare time series data. Time series data from patient health monitoring might include: data from an electrocardiogram (ECG); data concerning the total number of COVID-19 hospitalizations and deaths per day in a county; patient data over time such as days admitted or discharged, days on medications, days to recovery, etc. assist providers in analyzing trends and providing better care.
Self-driving cars. These vehicles continuously collect data about changes in their environment, adjusting based on obstacles, weather and road conditions, and numerous other variables.
Forensic patterns. A time-series database can reveal patterns that things like simple snapshot numbers can’t. For example, two investigative targets might claim that they don’t know each other well or have a relationship, but time series data might establish that they regularly send each other money in similar amounts at the same time each month. This forensic pattern suggests there is a relationship or agreement in place.
Weather data. An environmental value such as mean daily temperature (MDT), a location’s average low and high temperature for consecutive days, has been used for years to calculate energy efficiency for buildings. However, during the same period, the contributing environmental factors could be changing drastically even as MDT varies only slightly day to day. More detailed time series data on hourly temperature change, cloud cover, precipitation, and wind speed, and more enhances the ability to optimize energy efficiency.
Choosing a Database for Time Series Data
Which is the best database for time series data? This depends in large part on the specifics of the project and your goals. Most time-series data use cases are write heavy, so it’s important to consider databases that are optimized for fast writes: for example, NoSQL databases that use LSM-tree architectures.
Fortunately, there are multiple options available.
Apache Cassandra time series data store is a key value store with capacity for hundreds of thousands records per second for write-heavy apps. The Cassandra time series data model works well with sequenced data that might vary in size, and Cassandra time series data modeling means that each row can have a dynamic number of columns. Also very efficient with writes, Cassandra commonly selected as NoSQL for time series data.
It is possible to optimize Amazon DynamoDB as a time series data platform. The platform is capable of rapid querying and other functions. Find a description of how to store DynamoDB time series data here.
ScyllaDB – API-compatible with both Apache Cassandra and DynamoDB – offers a database-as-a-service, enterprise NoSQL database, and open source NoSQL database that are all well-suited to time series data. ScyllaDB matches the time series support offered by Apache Cassandra and DynamoDB – with the addition of enhancements that offer better performance at lower costs. Read about ScyllaDB time series data use cases here.
MongoDB is a document-based general purpose database with a rich query language and flexible schema design. As of version 5.0, there is native support for time series data in MongoDB. MongoDB is a popular “starter” option due to its low learning curve, but teams often migrate to the options mentioned above when they need to achieve greater performance at scale.
Another option is to use dedicated time-series databases (for example, InfluxDB, QuestDB, etc.) The decision on whether to adopt a dedicated time-series database is often influenced by factors such as the need for flexibility, cost, scalability, and high availability.
How to Handle Time Series Data
Time series data is collected, stored, visualized and analyzed across domains for various purposes. Analysis of time series data is central to data mining, pattern recognition, and machine learning. It is used for time series data clustering, classification, query by content, anomaly detection, and time series forecasting. Time series analysis is also used for forecasting in econometrics, geophysics, meteorology, quantitative finance, seismology, and statistics. Time series data is used in signal processing, communication engineering, and control engineering for signal detection and estimation.
Time series data preparation involves preparing the data for processing so the data scientists and the organization can glean the most insights from it. Time series data cleaning involves smooth filtering to identify and correct inaccurate records. This helps to standardize time series data to ensure poor data does not skew results. Data scientists use different types of charts that facilitate trend analysis, insight extraction, and anomaly detection for time series data visualization.
Time series data analysis is a specific method for analyzing a sequence of data points collected over time. In time series analysis, rather than just collecting data over time randomly or intermittently, computer scientists record data points over a set period of time at consistent intervals. This is distinct from other kinds of data analysis in the way that time series data reveals how the data changes over the course of time and the data points.
Time series analysis generally demands an extensive data set with many data points to ensure consistency and reliability and a representative sample size. Time series data can also be used to predict future trends and data based on historical data.
How to Work with Time Series Data: Time Series Analysis Methods
The ability to analyze time series data and extract meaningful insights is key to working with this kind of data. Here we introduce some basic time series forecasting methods:
Time series forecasting uses historical values and associated patterns—more broadly, trends in historical data—to predict future activity. That historical data becomes a model for future data, predicting likely scenarios that might happen along the way.
Time series forecasting methods include:
- Trend analysis. Analyzes consistent movement in either of two trend directions: deterministic, with a determined underlying cause, and stochastic, with an unexplained, random cause.
- Cyclical fluctuation analysis/Functional analysis. This identifies notable events by picking out the patterns and relationships in data.
- Seasonal pattern analysis/Seasonal variation. Describes events that occur at regular, specific intervals throughout the year. When data points close in time tend to be related it is called serial dependence.
To predict things in advance, time series modeling, a forecasting technique that uses time series data, drives decision-making with hidden insights derived from time-based data. For example, businesses use time series data to analyze competitive positioning, website traffic, sales projections, and more.
Here are three common methods for studying data:
- Box-Jenkins ARIMA models. Data scientists deploy these univariate models to understand single time-dependent variables and to predict future data points; they are based on the assumption that the data is stationary.
- Box-Jenkins Multivariate Models. Data scientists use multivariate models to analyze multiple time-dependent variables, such as temperature and humidity, over time.
- Holt-Winters Method: For data points that include seasonality, the Holt-Winters method is an exponential smoothing technique intended to predict outcomes.
Features of Time Series Data
The order in which time series data was observed is critical to its meaning; this natural time order renders time series data unique. Irrespective of the use case, all time series data models share three other time series data characteristics:
- Data is nearly always recorded as a new entry as it arrives
- Data typically arrives in time order
- Time intervals can be regular or irregular, but time is a primary axis
This means that time-series data workloads are typically append-only with erroneous data corrections happening after the fact only as exceptions, not as the usual course of business.
Time series data components can be classified into metrics and events:
- Metrics are collected at regular time intervals
- Events are collected at irregular time intervals
Time series data can also be classified as stock or flow data:
- Stock time series data measures attributes at specific points in time and collects that data, like static snapshots of the information as it was at particular moments.
- Flow time series data measures the activity of the attributes that make up a portion of the results over a specific time period.
Cross Sectional Data vs Time Series Data
Cross-sectional data is a group of behaviors or observations for multiple entities or subjects at a single point in time.
Panel data, sometimes referred to as cross-sectional time series data, is a combination of time series and cross-sectional data, a collection of behaviors or observations for multiple entities or subjects at multiple time periods.
To classify what you have as cross-sectional data, panel data, or time series data, assess what determines a unique record in the data set.
Differences between panel data vs time series data and cross-sectional data:
- If a timestamp is all that is required, it is probably time series data. For example, the closing prices for a single stock at closing time for five years (different time intervals, equally spaced), or a single patient’s EKG data for a one-hour procedure.
- If something other than a timestamp is required, it is probably cross-sectional data. To expand the above examples, the closing prices for every stock on the exchange today, or the heart rate from the EKG data of 100 patients as the same procedure begins.
- If a timestamp plus something else such as an ID is required, it is probably panel data. To complete the example, you might have data on the closing prices for each of the stocks for each day for a whole year.
Time Series Data Resources
Here are a few popular time series data resources:
- Time Series Databases: New Ways to Store and Access Data: “Time series data is of growing importance, especially with the rapid expansion of the Internet of Things. This concise guide shows you effective ways to collect, persist, and access large-scale time series data for analysis. You’ll explore the theory behind time series databases and learn practical methods for implementing them. Authors Ted Dunning and Ellen Friedman provide a detailed examination of open source tools such as OpenTSDB and new modifications that greatly speed up data ingestion.”
- Using Spring Boot, ScyllaDB and Time Series Data: Learn how to use Spring Boot apps with ScyllaDB for time series data, taking advantage of shard-aware drivers and prepared statements.
- KairosDB and ScyllaDB: A Time Series Solution for Performance and Scalability: How to build a highly available time-series solution with an efficiently tailored front-end framework and a backend database with a fast ingestion rate.
- ScyllaDB Time Series Use Case Library: Engineers share how their teams are tackling time series data challenges.
Does ScyllaDB Offer Solutions for Time Series Data?
Yes. ScyllaDB is commonly used as a time-series database, especially for IoT, activity tracking, cybersecurity, and fraud detection use cases. ScyllaDB supports native time-series data at massive scale, with a highly reliable and performant backend database that scales as your data grows. Learn more about ScyllaDB and how it is used for time series data use cases.