Version: 14.x

CSV Fast Data Connector Usage

The CSV Connector can be used either as a kubernetes cronjob or as a normal microservice in polling mode. In both cases, it will ask the Files Service the list of all the files present in the bucket of input and process them, generating a Kafka message for the Fast Data for each line. When a file has been completely processed, it will get moved to the bucket of output.

Kafka messages will be produced following the message adapter DB2 standard.

To change between the two execution modes, the LAUNCH_MECHANISM environment variable must have one of the following values: cronjob or polling.

Polling mode

If LAUNCH_MECHANISM is set to polling, then the service starts and never stops its execution.

In this case, the SCHEDULING variable is mandatory and its value is a cronjob expression following quartz format. Following this scheduling, the service will get the list of the new files added in the bucket and proceed to generate the corresponding Kafka messages.

Cronjob mode

If LAUNCH_MECHANISM is set to cronjob, the service starts, checks for new files in the bucket and process them. Once finished the execution, it stops.

That's why this mechanism is more suitable for a kubernetes cronjob component. To know more about the configuration of a cronjob in the console, read the documentation page.

Prometheus metrics management

When in polling mode, the service will expose its metrics on the route /-/metrics. In this way, it will be reachable as long as it's up and running.

In cronjob mode, however, its lifecycle will become ephemeral, invalidating metrics exposure. For this reason, when in cronjob mode, the CSV connector will push all the metrics to a Prometheus push-gateway service (for more info, check the Prometheus official documentation. In this case, a push-gateway should also be configured in the cluster.

Note

note

The variable PUSH_GATEWAY_SERVICE is optional even if the service is executing as a cronjob. In this case, it will be simply ignored and no metrics will be pushed.

The metrics exposed (no matter the launch mechanism) are shown in the table below:

Metric name	Type	Labels	Description
imported_files	counter	*file=fileName processed*=[`OK`/`KO`]	Number of imported files, labeled by filename and processing success.
kafka_message	counter	*topic=topicName sent*=[`OK`/`KO`]	Number of kafka messages sent, labeled by topic name and success.
csv_line	gauge	*file=fileName valid*=[`OK`/`KO`]	Number of csv lines read, labeled by filename and validation success.
cycle_execution_time	timer	-	Execution time in milliseconds of a single import cycle.
file_processing_time	timer	*file*=fileName	Processing time for a single file, labeled by filename.

Error Codes

The following is a list of all errors launched during the execution of the service.

Error code	Cause	Action	Actor
FD_CSVC_E0001	invalid parameter value	Check environment variables for any illegal configuration.	csv-connector
FD_CSVC_E0002	missing environment variable	Check environment variables for any missing configuration.	csv-connector
FD_CSVC_E0003	invalid filename format	Check csv filename extensions passed and be sure they have a .csv extension.	csv-connector
FD_CSVC_E2001	error during csv line validation	- Search for the csv line specified inside the csv file being parsed. - Check any discrepancies with the json schema passed.	csv-connector
FD_CSVC_E2002	jsonSchema type not managed	Be sure that the parameter described in the json schema is one of `string`, `number`, `integer` or `boolean`.	csv-connector
FD_CSVC_E2003	error when creating csv/entity mapper utility object	Check that the json schema passed is well formed (look at the description of the error message for better hints).	csv-connector
FD_CSVC_E2004	unexpected null parameter	Contact the maintainers of this plugin and show them this error.	csv-connector
FD_CSVC_E2005	error while scheduling the job	Contact the maintainers of this plugin and show them this error.	csv-connector
FD_CSVC_E2006	error while building kafka message	Contact the maintainers of this plugin and show them this error.	csv-connector
FD_CSVC_E2007	communication error with files-service	- Check the network health. - Check that the service address is correct.	csv-connector
FD_CSVC_E2008	communication error with push-gateway-service	- Check the network health. - Check that the service address is correct.	csv-connector

Dependencies and Optimal Performance

As previously mentioned, for the CSV Connector to function properly, it requires an instance of both the Files Service and the Crud Service running in the same cluster. Specifically, the Files Service must support multi-bucket mode.

For optimal performance, we recommend setting the CPU limit for both the CSV Connector and the Files Service to 1.5. This will help ensure that resources are efficiently utilized and that the system runs smoothly. During tests with 1GB files, we observed peaks of 4300 messages per second (msg/s) when sending Kafka messages to the topics. This demonstrates that the system is capable of handling a large volume of data. Additionally, we observed a maximum IO rate of 3.07 MB/s, indicating that the system can efficiently move data between different components.

It's worth noting that these results were achieved under controlled conditions and may vary depending on the specific use case. However, we believe that these recommendations can serve as a useful starting point for optimizing system performance.

Polling mode​

Cronjob mode​

Prometheus metrics management​

Error Codes​

Dependencies and Optimal Performance​

Polling mode

Cronjob mode

Prometheus metrics management

Error Codes

Dependencies and Optimal Performance