Fast Data Set up
Mia-Platform Fast Data is built upon Kafka, in order to configure the Fast Data within the Console, we assume an already set up Kafka cluster.
Kafka
Kafka is an event streaming platform used to write messages containing data received from disparate source systems and makes them available to target systems in near real time.
info
To correctly configure your Kafka cluster, you can visit this site.
Snappy compression
Snappy is a compression and decompression library whose aim is to offer high speed data flow while still maintaining a reasonable compression ratio. Among the various types of compression supported by Kafka for its messages, there is also Snappy.
The main advantages of Snappy are:
- Fast compression speed (around 250 MB/sec)
- Moderate CPU usage
- Stability and robustness to prevent crashing while still maintaining the same bitstream format among different versions
- Free and open source
Note: For further information about Snappy, check the official GitHub page of the library.
Provided that the client's CDC (Change Data Capture) supports Snappy compression, the console is already predisposed for it.
caution
Snappy, like every other compression and decompression algorithm, will always increase the delay between production and consumption of the message, hence it is not advised for strong real-time relying applications; on the other hand it is well recommended for initial loads which tend to be a lot heavier.
Topic naming convention
In order to work properly, the Fast Data infrastructure will need multiple Kafka topics, hence following a naming convention is useful. The following is the naming convention we recommend in order to work with the Fast Data and its tools properly.
Ingestion topic from CDC to MongoDB
producer: the system producing its own events
<tenant>.<environment>.<source-system>.<projection>.ingestion
An example:
test-tenant.PROD.system-name.test-projection.ingestion
Topic for verified projection update
producer Real Time Updater
<tenant>.<environment>.<mongo-database>.<collection>.pr-update
An example:
test-tenant.PROD.restaurants-db.reviews-collection.pr-update
Topic for Single View Creator trigger
producer: Single View Trigger
<tenant>.<environment>.<mongo-database>.<single-view-name>.sv-trigger
An example:
test-tenant.PROD.restaurants-db.reviews-sv.sv-trigger
Topic for verified update of the Single View
producer: Single View Creator
<tenant>.<environment>.<mongo-database>.<single-view-name>.sv-update
An example:
test-tenant.PROD.restaurants-db.reviews-sv.sv-update
Create topics
You can create a topic using Kafka CLI, or if you use the Confluent Cloud you can use the user interface.
Confluent cloud
Note: This documentation about Confluent Cloud has been last checked on date 5th February 2021. Some information could be outdated. Check out the official documentation of Confluent here
If you use a cluster in Confluent cloud, you could create topics both from UI and from CLI.
Use Confluent Cloud UI
First, you need to log in to Confluent Cloud, click on environment and cluster when you want to create the topic.
If you don't have a cluster, create one following this documentation
Create topic
On the left menu, click on Topics
and Add a topic
button. Insert the topic name and the number of partitions required. Here you could create with defaults or customize topic settings.
info
We suggest using a topic name like tenant.environment.system.projection.ingestion
Note: if this documentation seems outdated, follow the official one
Create service account
On the left menu, click on API access
and add key (if not already exists) for Granular access
.
Here you could select an already existent service account or create a new one.
info
We suggest you to create a service account for each project and environment.
Note: if this documentation seems outdated, follow the official one
Create ACL rules
Once created the service account, you can set from user interface:
- type: set topic type.
- topic name: new or existent one.
- pattern type: literal or prefixed. If you want to declare an ACL for each topic you should use literal.
- operation: for each topic, you should set READ and WRITE operation.
- permission: could be
ALLOW
orDENY
. You should set ALLOW. Once created, by default permission are to deny all others operations.
Use Confluent Cloud CLI
First, you should install the Confluent CLI.
Once installed, to create a new topic (with some custom config) run:
ccloud kafka topic create --partitions 3 --cluster CLUSTER_ID --config cleanup.policy=compact --config retention.ms=2592000000 'tenant.environment.system.projection.ingestion';
You should create a service account if you have not already following this guide
After the creation of the topic, you can associate the ACL to a service account:
ccloud kafka acl create --allow --service-account SERVICE_ACCOUNT --operation WRITE --topic 'tenant.environment.system.projection.ingestion' --cluster CLUSTER_ID;
ccloud kafka acl create --allow --service-account SERVICE_ACCOUNT --operation READ --topic 'tenant.environment.system.projection.ingestion' --cluster CLUSTER_ID;
Set up a Kafka consumer
To set up the consumer, you should create an ACL with the READ
operation set to consumer group id configured in the environment variables.
If you have not a service account, you could create it following this guide.
Consumer group ACL from UI
To set up the ACL for the consumer group, from the Confluent Cloud UI you should set:
- type: set Consumer group type
- consumer group ID: write consumer group ID configured in environment variables
- pattern type: literal or prefixed. If you want to declare an ACL for each topic you should use
literal
; - operation: You should set the READ operation;
- permission: could be
ALLOW
orDENY
. You should set ALLOW. Once created, by default permission are to deny all others operations.
Consumer group ACL from Confluent CLI
If you set my-consumer-group.development
as consumer group id, you can configure from CLI:
ccloud kafka acl create --allow --service-account SERVICE_ACCOUNT --operation READ --consumer-group "my-consumer-group.development" --cluster CLUSTER_ID;
Note: if this documentation seems outdated, follow the official one
Add a CRUD Service to your project
Projections and single views created in the Console are handled by the CRUD Service. Therefore, if your project does not already have a CRUD Service you should add one. Follow this link to learn how to correctly create and configure your CRUD Service.
Set up environment variables
When you start using Fast Data, you should set some environment variables in the Envs
section in Console to correctly deploy your project. Click here to view how this section works, and how to differentiate the environment variables through environments.
The environment variables to set are shared for all the System of Source, and all these environment variables must be added if a System of Records has been created.
Here is the list:
LOG_LEVEL: it should already be set as environment variables
FAST_DATA_PROJECTIONS_DATABASE_NAME: name of the db where projections are saved
KAFKA_BROKERS: the host of your Kafka cluster (with the port)
MONGODB_URL: the URL to MongoDB. It is the same used, for example, for CRUD service