Skip to main content
Version: 10.9.x

Architecture

In this page, you will find:

  • An overview of the services and technologies used in a Fast Data system;
  • A discussion over the architecture.

Overview

Fast Data is based on a group of services that communicate with each other via events sent on Kafka or MongoDB, and persisting Projections and Single Views on MongoDB. This architectural setup allows for great horizontal scalability.

Fast Data overview

Services

Real Time Updater (RTU)

The Real Time Updater is responsible for listening to the events concerning changes to the SoR and updating the Projections accordingly. Optionally, the RTU can perform other activities:

  • Emit Projection Update events, notifying the listener that a certain Projection has been updated;
  • Check which Single Views should be updated, and emit the relevant Projection Change event.

It is up to you to decide whether to make it execute any of these, based on the architecture you prefer to build. For example, if having your Projections up to date in a very short period of time is crucial for your business, you might want to skip the part where you check which Single Views should be updated, delegating this job to the Single View Trigger and decreasing the overhead of the RTU.

Single View Trigger Generator (SVTG)

The Single View Trigger Generator listens to Projection Update events, performing the logic needed to get the identifiers of all the Single Views that need to be updated, and emits the relevant Projection Change(sv-trigger) events. This activity can be performed by the RTU as well. Using the SVTG is recommended for a faster update of Projections, and a greater fault tolerance for the RTU.

Single View Creator (SVC)

The Single View Creator reacts to Projection Change events, updating the Single Views by aggregating the relevant data. It can optionally emit Single View Event and Single View Before After events. If the service is configured to read the Projection Change events on Kafka, then it will use the usual publish/subscribe pattern, granting support to horizontal scalability and fault tolerance with ease. If the service is configured to read the Projection Change events on MongoDB, then it will poll MongoDB to check if there are any events that need to be handled. To avoid concurrency problems, it reads atomically and marks the Projection Change, so that other SVCs won't start processing the same data.

Other Services

Connectors

The Connectors are the components responsible of transmitting the changes that occur within your systems (e.g. create, update or delete operations) to the event streaming platform in near real-time.

Depending on how changes are produced, where your source data is stored and how they can be accessed there can be different ways to produce the corresponding change event on the Fast Data ingestion topics. When data is stored on a database and the changes to be monitored are the action on the database itself, exploiting a Change Data Capture (CDC) system is usually recommended. This system works by detecting changes in the database and emitting the corresponding events in near real-time on the configured event streaming platform so that subsequent components can process the events.

Bucket Storage Support

The Bucket Storage Support is a feature that enables the storing of messages flowing through a channel into an external object storage system, such as a Google Cloud Storage bucket or any bucket compatible with Amazon AWS S3 interface. Such an external archiving system can be configured to retain saved messages for a very long time and it enables a large array of different applications and processing over the stored data, promoting different types of access to the messages. The main feature is the ability to replay onto a channel messages that were previously read and stored in the bucket.

Technologies

Fast Data revolves around three major technologies: Kubernetes, Kafka, and MongoDB. The Kubernetes part is pretty much standard, and thanks to the Console you won't need to directly deal with it, but will still enjoy all the advantages it has to offer.

On the other hand, Kafka and MongoDB can be used in different steps of the process, and are thus discussed in the following subsections.

Kafka

Kafka is a distributed event streaming platform, and it is used for most Fast Data events, starting from the ones produced by the CDC, up to the ones produced by the SVC. For some intermediate data, however, it is possible to choose whether you want it on Kafka or MongoDB, that is the case for Projection Change events.

Kafka is very reliable, and has a great support for topic partitioning. This enables horizontal scalability, since it is possible to have multiple replicas of the same service: each replica will read from its own partition, granting all the required properties and speeding up the process. This way, it is as close to being real-time as possible.

MongoDB

MongoDB is a non-relational data store, used for storing Projections, Single Views, and some intermediate data, namely Projection Change and Single View Error collections. In the case of Fast Data, the Projection and Single View collections are saved in a format that is compatible with the CRUD service, which opens a ton of possibilities for exposing new services and consuming data in innovative ways.

MongoDB is great for horizontal scalability.

CRUD Service

Since the MongoDB collections are compatible with the CRUD Service and are automatically registered by the Console, they are ready to be exposed as APIs and are easily customizable (e.g. adding indexes just requires a few clicks in the Console). This allows you to easily connect them to the CMS, your applications, analytics, and third party software.

Putting it all together

Fast Data architecture is rather streamlined, with just a couple of pivoting points. A regular flow of information follows this path:

  1. The CDC emits an event stating that some data in the SoR has changed;
  2. The RTU performs the normalization of the messages received by the CDC to select the ones of interest and make them adhere to a standard of interest, and then stores the Projections on MongoDB;
    1. The RTU emits a Projection Update event, if it is configured to do so;
  3. The RTU or the SVT (depending on which one you chose for your architecture) compute and emit a Projection Change event, saving it either on Kafka or on MongoDB
  4. The SVC reads the Projection Change message, either polling MongoDB or reacting to the Kafka message. Then, it aggregates the Single View using the new data, and stores it to MongoDB;
    1. The SVC emits a Single View Event and/or a Single View Before After event, if it is configured to do so.

Fast data architecture

There are no hard constraints, but generally all the services belong to the same Kubernetes namespace, while the MongoDB and Kafka instances can be on managed hosts, on the same cluster, on premise, etc.

The SoRs usually belong to a different network portion, far from all the processing logics, since they must be kept independent of all the Fast Data flow, with the only exception of the CDC being connected to them.