Skip to main content
Version: 12.x (Current)

CSV Fast Data Connector Usage

The CSV Connector can be used either as a kubernetes cronjob or as a normal microservice in polling mode. In both cases, it will ask the Files Service the list of all the files present in the bucket of input and process them, generating a Kafka message for the Fast Data for each line. When a file has been completely processed, it will get moved to the bucket of output.

Kafka messages will be produced following the message adapter DB2 standard.

To change between the two execution modes, the LAUNCH_MECHANISM environment variable must have one of the following values: cronjob or polling.

Polling mode

If LAUNCH_MECHANISM is set to polling, then the service starts and never stops its execution.

In this case, the SCHEDULING variable is mandatory and its value is a cronjob expression following quartz format. Following this scheduling, the service will get the list of the new files added in the bucket and proceed to generate the corresponding Kafka messages.

Cronjob mode

If LAUNCH_MECHANISM is set to cronjob, the service starts, checks for new files in the bucket and process them. Once finished the execution, it stops.

That's why this mechanism is more suitable for a kubernetes cronjob component. To know more about the configuration of a cronjob in the console, read the documentation page.

Prometheus metrics management

When in polling mode, the service will expose its metrics on the route /-/metrics. In this way, it will be reachable as long as it's up and running.

In cronjob mode, however, its lifecycle will become ephemeral, invalidating metrics exposure. For this reason, when in cronjob mode, the CSV connector will push all the metrics to a Prometheus push-gateway service (for more info, check the Prometheus official documentation. In this case, a push-gateway should also be configured in the cluster.

Note

note

The variable PUSH_GATEWAY_SERVICE is optional even if the service is executing as a cronjob. In this case, it will be simply ignored and no metrics will be pushed.

The metrics exposed (no matter the launch mechanism) are shown in the table below:

Metric nameTypeLabelsDescription
imported_filescounterfile=fileName
processed=[OK/KO]
Number of imported files, labeled by filename and processing success.
kafka_messagecountertopic=topicName
sent=[OK/KO]
Number of kafka messages sent, labeled by topic name and success.
csv_linegaugefile=fileName
valid=[OK/KO]
Number of csv lines read, labeled by filename and validation success.
cycle_execution_timetimer-Execution time in milliseconds of a single import cycle.
file_processing_timetimerfile=fileNameProcessing time for a single file, labeled by filename.

Error Codes

The following is a list of all errors launched during the execution of the service.

Error codeCauseActionActor
FD_CSVC_E0001invalid parameter valueCheck environment variables for any illegal configuration.csv-connector
FD_CSVC_E0002missing environment variableCheck environment variables for any missing configuration.csv-connector
FD_CSVC_E0003invalid filename formatCheck csv filename extensions passed and be sure they have a .csv extension.csv-connector
FD_CSVC_E2001error during csv line validation- Search for the csv line specified inside the csv file being parsed.
- Check any discrepancies with the json schema passed.
csv-connector
FD_CSVC_E2002jsonSchema type not managedBe sure that the parameter described in the json schema is one of string, number, integer or boolean.csv-connector
FD_CSVC_E2003error when creating csv/entity mapper utility objectCheck that the json schema passed is well formed (look at the description of the error message for better hints).csv-connector
FD_CSVC_E2004unexpected null parameterContact the maintainers of this plugin and show them this error.csv-connector
FD_CSVC_E2005error while scheduling the jobContact the maintainers of this plugin and show them this error.csv-connector
FD_CSVC_E2006error while building kafka messageContact the maintainers of this plugin and show them this error.csv-connector
FD_CSVC_E2007communication error with files-service- Check the network health.
- Check that the service address is correct.
csv-connector
FD_CSVC_E2008communication error with push-gateway-service- Check the network health.
- Check that the service address is correct.
csv-connector

Dependencies and Optimal Performance

As previously mentioned, for the CSV Connector to function properly, it requires an instance of both the Files Service and the Crud Service running in the same cluster. Specifically, the Files Service must support multi-bucket mode.

For optimal performance, we recommend setting the CPU limit for both the CSV Connector and the Files Service to 1.5. This will help ensure that resources are efficiently utilized and that the system runs smoothly. During tests with 1GB files, we observed peaks of 4300 messages per second (msg/s) when sending Kafka messages to the topics. This demonstrates that the system is capable of handling a large volume of data. Additionally, we observed a maximum IO rate of 3.07 MB/s, indicating that the system can efficiently move data between different components.

It's worth noting that these results were achieved under controlled conditions and may vary depending on the specific use case. However, we believe that these recommendations can serve as a useful starting point for optimizing system performance.