Graph Database FAQs
What is a Graph Database?
Graph databases store data using topographical data models. Nodes in graph databases can represent companies, customers, or any other entity or amount of data. Graph database administrators can create usable data models, even as they scale high data values.
How do graph databases work to query information?
Different graph databases share several similarities:
- Data storage
- Data is recorded and represented in a topographical schema
- Users retrieve data using query language
The structure of graph databases vary. Some businesses use RDF databases, a type of NoSQL graph database sometimes referred to as a triple store that retrieves “triples,” data organized based on a subject-predicate-object relationship.
For example:
Joe > is friends with > Jane
If the indexer programming wants to use graph analytics to learn about Joe and they have data on Joe’s friends, they only need to search the graph database for that combination of triples ( :Joe :is friends with ? ). There is probably other data on Joe, too, like where he shops ( :Joe :shops at ? ) or what he listens to ( :Joe :listens to ? ).
Types of Graph Databases
There are two basic types of graph database data models: property graphs that include nodes and edges, and the more complex emphasis on relationships and analysis seen in knowledge graphs. Knowledge graphs include RDF graphs (Resource Description Framework) that emphasize data integration such as the one above that can focus on the semantic aspects of data and store information in triples. RDF graphs conform to a set of graph database design principles promulgated by the Worldwide Web Consortium (W3C) designed to represent statements and are best for providing rich semantics and inferences from data and representing complex metadata and master data.
Indexing strategies for both types of graphs are generally similar although differences remain. Over time, the architectural distinctions between knowledge graphs and property graphs are likely to become less important.
Advantages and Disadvantages of Graph Databases
Among the main advantages of graph databases over relational databases are the more flexible, high-performance graph format for identifying and analyzing distant connections between data based on factors such as quality or strength of relationships. Speed is another of the important benefits of graph databases. Because graph databases store relationships, queries run much more rapidly and users need not execute endless join operations.
The main graph database disadvantages are: a lack of a standardized query language and graphs which are less appropriate for transactional-based systems.
Graph Database Use Cases
Fraud Detection
Real-time fraud detection systems are among the most advanced graph database applications. Graph databases highlight relationships and queries that can show when flagged credit card numbers or email addresses are being used or when multiple people in different physical locations are associated with the same IP address or personal email address. Graph analytics helps establish patterns between nodes—here, showing anomalous behavior between (cardholders), purchase categories, purchase locations, terminals, transactions,etc.
Recommendation Engines
Ecommerce recommendation engines are another example of when to use graph databases. Graph databases for big data allow users to graphically store relationships between categories of data such as friends, interests, and purchase history. For example, you can see what trusted friends buy, or what people who follow the same hobbies use to pursue them.
Social Network Analysis
No introduction to graph databases would be complete without a discussion of social networks and social media analysis. Social networks are the perfect use case for graph databases because they can manage multi-dimensional connections and engagements between many nodes. A social network graph analysis can determine:
- Number of nodes/User activity
- Connection density/User influence
- Two-way engagement/Connection density and direction
Graph analytics make it possible to identify complex patterns rapidly and filter bot accounts, for example.
Graph Database vs Relational Database
A NoSQL graph database stores data as a network graph and prioritizes relationships between data.
Relational databases store data in relational tables defined by rows and columns. Each row can be linked to other rows in other tables because it is identified by a unique key. There is also a primary identifying key for each individual table that corresponds with the information within the table.
Graph databases are made up of nodes, edges, and the relationships between them. Nodes represent particular entities, and edges represent connections between nodes. Graph databases are designed to be scalable and flexible, and store the data relationships themselves as data. This emphasis on data relationships helps users explore complex data sets and make connections between data points.
Relational databases infer a relational focus between columns of data tables, not data points. It is easy to add data to either kind of database. However, because relational databases require complex joins on data tables to perform complex queries, they are typically faster in graph databases.
Does ScyllaDB Offer Solutions for Graph Databases?
Yes. ScyllaDB is an ideal data storage layer for graph databases like JanusGraph, which can plugin to NoSQL databases like Apache HBase, Google Cloud Bigtable, Oracle Berkeley DB Java Edition, Apache Cassandra, and ScyllaDB for the data storage layer. With ScyllaDB, users get low and consistent latency, high availability, up to x10 throughput, ease of use, and a highly scalable system.
A group at IBM compared using ScyllaDB as the JanusGraph storage backend vs. Apache Cassandra and HBase. They found that ScyllaDB displayed nearly 35% higher throughput when inserting vertices than HBase and almost 3X Cassandra’s throughput. ScyllaDB’s throughput was 160% better than HBase and more than 4X that of Cassandra when inserting edges. ScyllaDB performed 72% better than Cassandra in a query performance test and nearly 150% better than HBase.
Learn more about using ScyllaDB as the data storage layer for open-source graph databases.