Skip to main content
Version: 9.x (Current)

Fast Data Set up

Mia-Platform Fast Data is built upon Kafka, in order to configure the Fast Data within the Console, we assume an already set up Kafka cluster.

Kafka

Kafka is an event streaming platform used to write messages containing data received from disparate source systems and makes them available to target systems in near real time.

info

To correctly configure your Kafka cluster, you can visit this site.

Snappy compression

Snappy is a compression and decompression library whose aim is to offer high speed data flow while still maintaining a reasonable compression ratio. Among the various types of compression supported by Kafka for its messages, there is also Snappy.

The main advantages of Snappy are:

  • Fast compression speed (around 250 MB/sec)
  • Moderate CPU usage
  • Stability and robustness to prevent crashing while still maintaining the same bitstream format among different versions
  • Free and open source

Note: For further information about Snappy, check the official GitHub page of the library.

Provided that the client's CDC (Change Data Capture) supports Snappy compression, the console is already predisposed for it.

caution

Snappy, like every other compression and decompression algorithm, will always increase the delay between production and consumption of the message, hence it is not advised for strong real-time relying applications; on the other hand it is well recommended for initial loads which tend to be a lot heavier.

Topic naming convention

In order to work properly, the Fast Data infrastructure will need multiple Kafka topics, hence following a naming convention is useful. The following is the naming convention we recommend in order to work with the Fast Data and its tools properly.

Ingestion topic from CDC to MongoDB

producer: the system producing its own events

<tenant>.<environment>.<source-system>.<projection>.ingestion

An example:

test-tenant.PROD.system-name.test-projection.ingestion

Topic for verified projection update

producer Real Time Updater

<tenant>.<environment>.<mongo-database>.<collection>.pr-update

An example:

test-tenant.PROD.restaurants-db.reviews-collection.pr-update

Topic for Single View Creator trigger

producer: Single View Trigger

<tenant>.<environment>.<mongo-database>.<single-view-name>.sv-trigger

An example:

test-tenant.PROD.restaurants-db.reviews-sv.sv-trigger

Topic for verified update of the Single View

producer: Single View Creator

<tenant>.<environment>.<mongo-database>.<single-view-name>.sv-update

An example:

test-tenant.PROD.restaurants-db.reviews-sv.sv-update

Create topics

You can create a topic using Kafka CLI, or if you use the Confluent Cloud you can use the user interface.

Confluent cloud

Note: This documentation about Confluent Cloud has been last checked on date 5th February 2021. Some information could be outdated. Check out the official documentation of Confluent here

If you use a cluster in Confluent cloud, you could create topics both from UI and from CLI.

Use Confluent Cloud UI

First, you need to log in to Confluent Cloud, click on environment and cluster when you want to create the topic.

If you don't have a cluster, create one following this documentation

Create topic

On the left menu, click on Topics and Add a topic button. Insert the topic name and the number of partitions required. Here you could create with defaults or customize topic settings.

info

We suggest using a topic name like tenant.environment.system.projection.ingestion

Note: if this documentation seems outdated, follow the official one

Create service account

On the left menu, click on API access and add key (if not already exists) for Granular access. Here you could select an already existent service account or create a new one.

info

We suggest you to create a service account for each project and environment.

Note: if this documentation seems outdated, follow the official one

Create ACL rules

Once created the service account, you can set from user interface:

  • type: set topic type.
  • topic name: new or existent one.
  • pattern type: literal or prefixed. If you want to declare an ACL for each topic you should use literal.
  • operation: for each topic, you should set READ and WRITE operation.
  • permission: could be ALLOW or DENY. You should set ALLOW. Once created, by default permission are to deny all others operations.

Use Confluent Cloud CLI

First, you should install the Confluent CLI.

Once installed, to create a new topic (with some custom config) run:

ccloud kafka topic create --partitions 3 --cluster CLUSTER_ID --config cleanup.policy=compact --config retention.ms=2592000000 'tenant.environment.system.projection.ingestion';

You should create a service account if you have not already following this guide

After the creation of the topic, you can associate the ACL to a service account:

ccloud kafka acl create --allow --service-account SERVICE_ACCOUNT --operation WRITE --topic 'tenant.environment.system.projection.ingestion' --cluster CLUSTER_ID;
ccloud kafka acl create --allow --service-account SERVICE_ACCOUNT --operation READ --topic 'tenant.environment.system.projection.ingestion' --cluster CLUSTER_ID;

Set up a Kafka consumer

To set up the consumer, you should create an ACL with the READ operation set to consumer group id configured in the environment variables.

If you have not a service account, you could create it following this guide.

Consumer group ACL from UI

To set up the ACL for the consumer group, from the Confluent Cloud UI you should set:

  • type: set Consumer group type
  • consumer group ID: write consumer group ID configured in environment variables
  • pattern type: literal or prefixed. If you want to declare an ACL for each topic you should use literal;
  • operation: You should set the READ operation;
  • permission: could be ALLOW or DENY. You should set ALLOW. Once created, by default permission are to deny all others operations.

Consumer group ACL from Confluent CLI

If you set my-consumer-group.development as consumer group id, you can configure from CLI:

ccloud kafka acl create --allow --service-account SERVICE_ACCOUNT --operation READ --consumer-group "my-consumer-group.development" --cluster CLUSTER_ID;

Note: if this documentation seems outdated, follow the official one

Add a CRUD Service to your project

Projections and single views created in the Console are handled by the CRUD Service. Therefore, if your project does not already have a CRUD Service you should add one. Follow this link to learn how to correctly create and configure your CRUD Service.

Set up environment variables

When you start using Fast Data, you should set some environment variables in the Envs section in Console to correctly deploy your project. Click here to view how this section works, and how to differentiate the environment variables through environments.

The environment variables to set are shared for all the System of Source, and all these environment variables must be added if a System of Records has been created.

Here is the list:

  • LOG_LEVEL: it should already be set as environment variables

  • FAST_DATA_PROJECTIONS_DATABASE_NAME: name of the db where projections are saved

  • KAFKA_BROKERS: the host of your Kafka cluster (with the port)

  • MONGODB_URL: the URL to MongoDB. It is the same used, for example, for CRUD service