Skip to main content
Version: 14.x

Metrics

The Farm-Data service exposes various Prometheus metrics to monitor the health and performance of the Kafka producer, consumer, and message processing operations.

All metrics are exposed via OpenTelemetry and are compatible with Prometheus scraping. These metrics can be accessed at endpoint /-/metrics.

Kafka values are periodically updated by an internal process of librdkafka. This update period defaults to 10 seconds, and it can be customized by setting statistics.interval.ms property of librdkafka.
However, it is recommended to not configure it and use its default value.

Service Metrics

These metrics are specific to the service functionality and provide insights into data aggregation and stream operations.

Metric nameMetric typeDescriptionLabels
farm_data_aggregation_unit_durationHistogramNumber of milliseconds taken to process a message with a unitnode_id=<node-id>
farm_data_aggregation_single_viewsCounterNumber of single views generated through timenode_id=<node-id>
farm_data_msg_compression_rateHistogramPercentage distribution of incoming messages that were retained for processing
farm_data_processed_msgCounterNumber of messages that have been processedtopic=<topic-name> partition=<partition-number>
farm_data_msg_payload_sizeHistogramPercentage distribution of messages payload size that were produced onto the output topic

Kafka Producer Metrics

Metric nameTypeHelpLabels
kafka_producer_replyqGaugeNumber of ops (callbacks, events, etc) waiting in queue for application to serve with rd_kafka_poll()
kafka_producer_tx_totalGaugeTotal number of requests sent to Kafka brokers
kafka_producer_tx_bytes_totalGaugeTotal number of bytes transmitted to Kafka brokers
kafka_producer_rx_totalGaugeTotal number of responses received from Kafka brokers
kafka_producer_rx_bytes_totalGaugeTotal number of bytes received from Kafka brokers
kafka_producer_tx_msgs_totalGaugeThe total number of messages transmitted (produced) to brokers
kafka_producer_tx_msgs_bytes_totalGaugeThe total number of bytes transmitted (produced) to brokers
kafka_producer_queue_msgs_countGaugeThe current number of messages in producer queues
kafka_producer_queue_msgs_sizeGaugeThe current total size of messages in producer queues
kafka_producer_queue_msgs_max_countGaugeThe maximum number of messages allowed in the producer queues
kafka_producer_queue_msgs_max_sizeGaugeThe maximum total size of messages allowed in the producer queues
kafka_producer_broker_rtt_avgGaugeBroker latency / round-trip time (average)node_id=<node-id> node_name=<node-name>
kafka_producer_broker_rtt_stdGaugeBroker latency / round-trip time (standard deviation)node_id=<node-id> node_name=<node-name>
kafka_producer_broker_rtt_minGaugeBroker latency / round-trip time (minimum)node_id=<node-id> node_name=<node-name>
kafka_producer_broker_rtt_maxGaugeBroker latency / round-trip time (maximum)node_id=<node-id> node_name=<node-name>
kafka_producer_broker_rtt_p50GaugeBroker latency / round-trip time (50th percentile)node_id=<node-id> node_name=<node-name>
kafka_producer_broker_rtt_p90GaugeBroker latency / round-trip time (90th percentile)node_id=<node-id> node_name=<node-name>
kafka_producer_broker_rtt_p95GaugeBroker latency / round-trip time (95th percentile)node_id=<node-id> node_name=<node-name>
kafka_producer_broker_rtt_p99GaugeBroker latency / round-trip time (99th percentile)node_id=<node-id> node_name=<node-name>
kafka_producer_broker_throttle_avgGaugeBroker throttling time (average)node_id=<node-id> node_name=<node-name>
kafka_producer_broker_throttle_stdGaugeBroker throttling time (standard deviation)node_id=<node-id> node_name=<node-name>
kafka_producer_broker_throttle_minGaugeBroker throttling time (minimum)node_id=<node-id> node_name=<node-name>
kafka_producer_broker_throttle_maxGaugeBroker throttling time (maximum)node_id=<node-id> node_name=<node-name>
kafka_producer_broker_tx_errsGaugeTotal number of transmission errorsnode_id=<node-id> node_name=<node-name>
kafka_producer_broker_rx_errsGaugeTotal number of receive errorsnode_id=<node-id> node_name=<node-name>

Kafka Consumer Metrics

Metric nameTypeHelpLabels
kafka_consumer_replyqGaugeNumber of ops (callbacks, events, etc) waiting in queue for application to serve with rd_kafka_poll()consumer_group=<consumer-group>
kafka_consumer_tx_totalGaugeTotal number of requests sent to Kafka brokersconsumer_group=<consumer-group>
kafka_consumer_tx_bytes_totalGaugeTotal number of bytes transmitted to Kafka brokersconsumer_group=<consumer-group>
kafka_consumer_rx_totalGaugeTotal number of responses received from Kafka brokersconsumer_group=<consumer-group>
kafka_consumer_rx_bytes_totalGaugeTotal number of bytes received from Kafka brokersconsumer_group=<consumer-group>
kafka_consumer_rx_msgs_totalGaugeTotal number of messages consumed, not including ignored messages (due to offset, etc), from Kafka brokersconsumer_group=<consumer-group>
kafka_consumer_rx_msgs_bytes_totalGaugeTotal number of message bytes (including framing) received from Kafka brokersconsumer_group=<consumer-group>
kafka_consumer_group_stateGaugeLocal consumer group handler's stateconsumer_group=<consumer-group>
kafka_consumer_group_state_ageGaugeTime elapsed since last state changeconsumer_group=<consumer-group>
kafka_consumer_group_rebalance_ageGaugeTime elapsed since last rebalance (assign or revoke)consumer_group=<consumer-group>
kafka_consumer_group_rebalance_count_totalGaugeTotal number of rebalances (assign or revoke)consumer_group=<consumer-group>
kafka_consumer_group_assigned_partition_countGaugeCurrent assignment's partition countconsumer_group=<consumer-group>
kafka_consumer_broker_rtt_avgGaugeBroker latency / round-trip time (average)consumer_group=<consumer-group> node_id=<node-id> node_name=<node-name>
kafka_consumer_broker_rtt_stdGaugeBroker latency / round-trip time (standard deviation)consumer_group=<consumer-group> node_id=<node-id> node_name=<node-name>
kafka_consumer_broker_rtt_minGaugeBroker latency / round-trip time (minimum)consumer_group=<consumer-group> node_id=<node-id> node_name=<node-name>
kafka_consumer_broker_rtt_maxGaugeBroker latency / round-trip time (maximum)consumer_group=<consumer-group> node_id=<node-id> node_name=<node-name>
kafka_consumer_broker_rtt_p50GaugeBroker latency / round-trip time (50th percentile)consumer_group=<consumer-group> node_id=<node-id> node_name=<node-name>
kafka_consumer_broker_rtt_p90GaugeBroker latency / round-trip time (90th percentile)consumer_group=<consumer-group> node_id=<node-id> node_name=<node-name>
kafka_consumer_broker_rtt_p95GaugeBroker latency / round-trip time (95th percentile)consumer_group=<consumer-group> node_id=<node-id> node_name=<node-name>
kafka_consumer_broker_rtt_p99GaugeBroker latency / round-trip time (99th percentile)consumer_group=<consumer-group> node_id=<node-id> node_name=<node-name>
kafka_consumer_broker_throttle_avgGaugeBroker throttling time (average)consumer_group=<consumer-group> node_id=<node-id> node_name=<node-name>
kafka_consumer_broker_throttle_stdGaugeBroker throttling time (standard deviation)consumer_group=<consumer-group> node_id=<node-id> node_name=<node-name>
kafka_consumer_broker_throttle_minGaugeBroker throttling time (minimum)consumer_group=<consumer-group> node_id=<node-id> node_name=<node-name>
kafka_consumer_broker_throttle_maxGaugeBroker throttling time (maximum)consumer_group=<consumer-group> node_id=<node-id> node_name=<node-name>
kafka_consumer_broker_tx_errsGaugeTotal number of transmission errorsconsumer_group=<consumer-group> node_id=<node-id> node_name=<node-name>
kafka_consumer_broker_rx_errsGaugeTotal number of receive errorsconsumer_group=<consumer-group> node_id=<node-id> node_name=<node-name>
kafka_consumer_group_lagGaugeNumber of messages that the consumer needs to readtopic=<topic-name> partition=<partition-id>
kafka_consumer_fetch_queue_countGaugeNumber of pre-fetched messages in consumer fetch queuetopic=<topic-name> partition=<partition-id>
kafka_consumer_fetch_queue_sizeGaugeBytes in consumer fetch queuetopic=<topic-name> partition=<partition-id>

Usage Notes

  • All metrics are collected via OpenTelemetry and can be scraped by Prometheus
  • Producer and consumer metrics are automatically collected from librdkafka statistics
  • Stream processing metrics track message processing success/failure rates
  • Queue metrics help monitor internal buffer utilization
  • Consumer lag metrics are essential for monitoring processing delays

Grafana Dashboard

Alongside the service, a Grafana dashboard is released, so that it is possible to set up a standardize manner to monitor the Farm-Data service. This dashboard can be found here, next to previous releases.

NameVersionLink
Aggregation Overviewv1.0.0download