Skip to main content
Version: 13.x (Current)

Database Preparation

info

Currently only MongoDB is supported as underlying database for storing the application data.

caution

It is highly recommended to employ a dedicated database / namespace to store Data Catalog data. This allows restricting which applications have access to the management collections and permits to keep the necessary collections all under the same location without additional clutter from other projects.

Data Catalog solution relies on a set of collections to carry out most of its tasks, that is storing data assets definitions and their associated metadata. In order to provide a performant and reliable system, the database must be configured accordingly to support application execution.

This means that required collections must be created, necessary index added and additional schema validation set on them, so that only valid data can be read and written to those collections.

To reach this objective in a straightforward manner we provide an administration command-line interface, which takes care of creating all the necessary resources on selected database. This CLI is offered within as a Docker image, which can be run within your namespace every time a new application version is released.

CLI Configuration

Running the CLI inside your project can be done exploiting a K8s cronjob. In this documentation page it is described how it can be configured, while here are highlighted which specific configurations of the K8s manifest:

  • <URL_TO_CONTAINER_REGISTRY> is the url to your Container Registry of reference;
  • data-fabric/fabric-admin image is employed, which should be found within your Container Registry;
  • suspend the cronjob must be released as suspended, since it should be launched manually only once;
  • <NAME_CONTAINER_REGISTRY_SECRET> name of the secret where are contained the credentials for connecting to your Container Registry;
  • MONGODB_ADMIN_URL connection string to your MongoDB database.
caution

Please, beware that the credentials assigned to MongoDB connection string must have the admin permission to modify collections (collMod), so that the CLI can add a schema validation on the collection. This connection string might be different from the one employed within Data Catalog application, where only reading and writing is sufficient;

Here, is provided an example of cronjob definition.

apiVersion: batch/v1
kind: CronJob
metadata:
name: fabric-data-catalog-admin
labels:
app: fabric-data-catalog-admin
# ...
# ...
spec:
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
schedule: '* * * * *'
# this job MUST be deployed as suspended, since it just run once
suspend: true
jobTemplate:
spec:
backoffLimit: 1
template:
metadata:
name: fabric-data-catalog-admin
labels:
app: fabric-data-catalog-admin
# ...
# ...
spec:
imagePullSecrets:
- name: <NAME_CONTAINER_REGISTRY_SECRET>
containers:
- name: fabric-data-catalog-admin
# current latest stable version of the image is 0.1.0
image: <URL_TO_CONTAINER_REGISTRY>/data-fabric/fabric-admin:0.1.0
args: [
"all",
"init",
"--url",
"{{MONGODB_ADMIN_URL}}",
]
restartPolicy: Never

This cronjob can then be manually triggered as described here every time a new version of data-fabric/fabric-admin image is available, which it should roughly correspond with a new release of Data Catalog application.

tip

In case in your Console Project it is possible to enable Custom Resources, then instead of creating a cronjob manifest it would be possible to define a job one, which does start as soon as released and does not need to be manually launched.

Cronjob is an alternative better suited in case manually launching the job for administering your database is preferred or enforced by any internal policy (e.g. you need to be granted the permission to edit your database).