Additional Considerations

Additional Considerations

#410330

Configuration for kube-state-metrics v2.x and Higher

Kubernetes Labels in kube-state-metrics

In kube-state-metrics v2.x features have been added to improve performance of both kube-state-metrics, itself and the Prometheus server that collects the resulting data.

One of these improvements affects the usage of Kubernetes object labels and annotations as Prometheus labels of the kube-state-metrics data points. In v2.x kube-state-metric, the default settings no longer include the collection of the Kubernetes object labels nor annotations and you need to configure the collection of these items using the command-line options.

Densify Container Data Collection and Kubernetes Labels

Though Densify's container data collection will work without the Kubernetes object labels as kube-state-metrics labels, you may want to enable the kube-state-metrics labels for the following use cases:

  • Node group data collection requires the node labels.
  • Data collection of other Kubernetes objects attempts to collect labels and annotations. These can be used to sort and filter containers in the UI or API and to create customized reports.

Node Groups

Node groups are not a Kubernetes feature, but rather are implemented by the public cloud provider's Kubernetes solution (e.g. AWS EKS, GCP GKE, Azure AKS). They are also used by 3rd party tools to provision the Kubernetes cluster (e.g. eksctl, kops).

Collecting node group data is only meaningful if you are able to match it to the public cloud provider's node group data (e.g. AWS ASG). In this case you need to enable node group data collection with kube-state-metrics version v2.x or higher.

  1. Add the following command-line argument to the kube-state-metrics container:
  2. ["--metric-labels-allowlist=nodes=[*]"]

  3. You can replace the wildcard (*) with a comma-separated list of specific node labels. This requires specific knowledge of the available node labels in the cluster, which depends on the cloud provider's Kubernetes solution and/or the 3rd party tool used to provision the cluster and their versions. You can do this if the performance of kube-state-metrics and/or Prometheus is a consideration.

Labels of Other Kubernetes Objects

In addition to node labels, Densify attempts to collect the following data, which can be further used as sort/filter criteria and to generate custom reports:

  1. If you want to collect this data with kube-state-metrics v2.x or higher, add the following command-line arguments to the kube-state-metrics container:
  2. ["--metric-labels-allowlist=nodes=[*],namespaces=[*],pods=[*],deployments=[*],replicasets=[*],daemonsets=[*],statefulsets=[*],jobs=[*],cronjobs=[*],horizontalpodautoscalers=[*]", "--metric-annotations-allowlist=namespaces=[*]"]

  3. Optionally, you can specify only the Kubernetes object labels that you need. Contact [email protected] for details.

Legacy Kube-State-Metrics

Both Kubernetes and kube-state-metrics have changed over the years, including addition and removal of features. Feature changes result in changes to kube-state-metrics, including removal of, or replacement by new metrics.

Densify's container data collection supports kube-state-metrics, version 1.5 or higher. If your monitoring stack is running an older version, some metrics listed on Densify's Github page are absent. In this case, Densify collects the older metrics. The table below summarizes the older versions of the metrics that have been deprecated, and their replacements.

After deploying the data forwarder, contact [email protected] to enable your Densify instance with container optimization.

Using an Observability Platform

When using an observability platform, data is collected from multiple clusters and/or other sources. The incoming data must be identifiable by a unique set of labels (name:value) for each Kubernetes/OpenShift cluster, from which Densify is collecting data.

The set of labels is typically obtained using "global.external_labels" in the configuration of the source Prometheus server/OTEL collector that is sending the data to the observability platform. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#configuration-file.

The following applies, per cluster for both in-cluster Prometheus and observability platforms:

  • Kube-state-metrics—Only one instance of kube-state-metrics can be scraped;
  • Openshift-state-metrics (OpenShift clusters only)—Only one instance of Openshift-state-metrics can be scraped;
  • cAdvisor—Typically runs within each node's kubelet for all cluster nodes.
  • Node exporter—Exports data from all of a cluster's nodes.

Note:  For cAdvisor and node exporter Densify needs the data from all nodes in the cluster but ONLY for the cluster nodes and not from other instances, like other nodes, VMs or cloud instances that do not belong to the clusters, that are being monitored.

Collecting node exporter data from virtual machines and/or cloud instances that are not cluster nodes will cause data integrity issues.

Egress Traffic Security Considerations

A Kubernetes or OpenShift cluster may run in an environment where egress (outgoing) web traffic is subject to network security devices or services. Such devices or services may perform "web filtering" by:

  • Replacing the target certificate with their own (often self-signed) certificate for the purpose of traffic inspection;
  • Manipulating the HTTP request body and/or headers;
  • Manipulating the HTTP response body and/or headers.

If the data forwarder is deployed in such a cluster, this may impact:

  • Collecting data from an external observability platform;
  • Uploading the data to Densify.

Issue: Self-signed Certificate Failure

If you are collecting data from an external observability platform or uploading the data to Densify and encounter failure and the Data Forwarder logs indicate the following:

failed to verify certificate: x509: certificate signed by unknown authority

or similar text, the certificate should be examined as follows:

  1. In the same cluster (and same namespace), examine the logs.
  2. If at the end of the log file you see text similar to:
  3. --- openssl log:

    ...

    verify error:num=...:self-signed certificate in certificate chain

    ...

Then it is likely that the genuine certificate has been replaced by a network security device/service self-signed certificate.

This issue can also occur when you are using in-cluster, Authenticated Prometheus where the CA certificate is not configured correctly. This case should be resolved by fixing the configuration.

Resolution

Turn off web filtering, either globally or for the selected target.

Issue: Request/Response Manipulation

If you are collecting data from an external observability platform, this issue may result in failure to collect your container data.

When uploading data to Densify, this issue may result in failure to upload any data due to authentication failure. Please verify that the username and (encrypted) password are correct, and then contact [email protected] for assistance.

If the Data Forwarder logs include text similar to:

{"level":"fatal","pkg":"default","error":"HTTP status code: 400, Message: message: Unauthorized, status: 400",...,"message":"failed to initialize Densify client"}

And the HTTP status code for Unauthorized is 400 (Bad Request), not 401, it's likely that the request has been manipulated. An incorrect username/(encrypted) password combination will return a 401 status code.

Resolution

Turn off web filtering, either globally or for the selected target.