Container Data Collection Prerequisites

#410140

Data Collection from an In-Cluster Prometheus Deployment

You can quickly deploy the data forwarder and all of the required prerequisite software using a Helm chart. See Kubex Automation Stack Helm Chart.

The following software is required for Densify container data collection and optimization.

Densify account—Contact Densify for details of your subscription or sign up for a free trial.

See www.densify.com/product/trial.

Kubernetes or OpenShift must be deployed.

Running cAdvisor as part of the kubelet provides the workload and configuration data required by Densify.

kube-state-metrics—This service monitors the Kubernetes API server and generates metrics from the various objects inside the individual Kubernetes components. This service provides orchestration and cluster level metrics such as deployments, pod metrics, resource reservation, etc. The collected metrics allow Densify to get a complete picture of how your containers are setup i.e. Replica Sets, Deployments, Pod and Container Labels.

Requires v1.5.0 or later. See additional considerations when using v2.x.
https://github.com/kubernetes/kube-state-metrics

Prometheus or supported observability platform—Collects metrics from configured targets at given intervals. It provides the monitoring/data aggregation layer. It must be deployed and configured to collect kube-state-metrics and cAdvisor/kubelet metrics. See additional considerations when using an observability platform.

https://prometheus.io

When deploying Prometheus and kube-state-metrics using a standard operator, some of the metrics that Densify needs for analysis may be excluded (i.e. on a deny list). Refer to Prometheus-Data for details of the list of metrics that Densify requires for analysis.

Node Exporter—This is an agent deployed on every node to collect data about the nodes, on which the containers are running. This provides host-related metrics such as CPU, memory, network, etc.

https://github.com/prometheus/node_exporter

The following item is not mandatory but provides additional environment information for Densify's container optimization analysis, .

Openshift-state-metrics—Expands upon kube-state-metrics by adding metrics for OpenShift-specific resources and provides additional details such as Cluster Resource Quotas (CRQ).

https://github.com/openshift/openshift-state-metrics

The data forwarder is only supported on Linux OS and x64 architecture.

After deploying the data forwarder, contact [email protected] to enable your Densify instance with container optimization.

Data Collection for GPU

Note the following additional prerequisites to collect the GPU data:

NVIDIA-device-plugin—This plugin allows containers to access the NVIDIA GPUs. It must be installed on all your Kubernetes clusters to allocate NVIDIA GPU resources to workloads and to provide the GPU data.
dcgm-exporter—This Prometheus exporter exposes GPU metrics from the Data Center GPU Manager (DCGM). It is required to collect GPU data such as, utilization, memory usage, and power usage from NVIDIA GPUs, The dcgm-exporter can be deployed as a DaemonSet, where each node with an NVIDIA GPU runs a pod that exposes these metrics in a format that Prometheus can scrape and the Densifydata forwarder then collects.

The GPU data collection is currently support on the following platforms:

Additional Considerations

If your cloud provider has not already deployed the NVIDIA plugin, you can deploy it yourself using the gpu-operator. Without this plugin, the Kubernetes cluster cannot support NVIDIA GPU data collection, meaning the Densifydata forwarder will have no GPU data to collect.

Additionally, the dcgm-exporter must be deployed—either by your cloud provider, by you directly, or as part of the gpu-operator. Like the NVIDIA plugin, the dcgm-exporter is required for collecting GPU metrics.

Densify’s All-in-One (AIO) Helm chart includes an option to deploy the dcgm-exporter as a subchart, controlled by a configuration value (which is disabled by default). The AIO Helm chart also provides the option to deploy the gpu-operator.

If you are using the AIO Helm chart, the bundled Prometheus instance will automatically scrape GPU metrics from the dcgm-exporter. However, if you're using your own Prometheus deployment, you must ensure it is configured to scrape GPU data from the dcgm-exporter.

Refer to Densify's Github repository for configuration details.