Container Data Forwarder with Prometheus

Container Data Forwarder with Prometheus

#410170

The Data Forwarder is an on-demand container that collects your container environment data from Kubernetes via Prometheus and forwards that data to Densify for analysis.

This topic shows you how to configure the Data Forwarder with a Prometheus server for data collection using YAML files. The data collection frequency is configurable through the Data Forwarder configuration map and cronjob file. You can also configure the Data Forwarder with an authenticated Prometheus server. See Containers Data Forwarder with Authenticated Prometheus

After deploying the Data Forwarder, contact [email protected] to enable your Densify instance with container optimization.

Prerequisites

The following software is required for Densify container data collection and optimization.

  1. Densify account—Contact Densify for details of your subscription or sign up for a free trial.
  2. Kubernetes or OpenShift must be deployed.
    • Running cAdvisor as part of the kubelet that by default, provides the workload and configuration data required by Densify.
  3. kube-state-metrics—This service monitors the Kubernetes API server and generates metrics from the various objects inside the individual Kubernetes components. This service provides orchestration and cluster level metrics such as deployments, pod metrics, resource reservation, etc. The collected metrics allow Densify to get a complete picture of how your containers are setup. i.e. Replica Sets, Deployments, Pod and Container Labels.
  4. Prometheus—Collects metrics from configured targets at given intervals. It provides the monitoring/data aggregation layer. It must be deployed and configured to collect kube-state-metrics and cAdvisor/kubelet metrics.

The following items are not mandatory but provide additional environment information for Densify's container optimization analysis.

  1. Node Exporter—This is an agent deployed on every node to collects data about the nodes, on which the containers are running. This provides the required host-related metrics such as CPU, mem, network, etc.
  2. Openshift-state-metrics—Expands upon kube-state-metrics by adding metrics for OpenShift-specific resources and provide additional details such as Cluster Resource Quotas (CRQ).

When deploying Prometheus and kube-state-metrics using a standard operator, some of the metrics that Densify needs for analysis may be excluded (i.e. on a deny list). Refer to Prometheus-Data.md for details of the required metrics.

Contact [email protected] for configuration details.

Deploying Data Forwarder for Data Collection

To deploy the Data Forwarder, you need to:

  1. Download 3 files from Densify's GitHub page. See Downloading Data Forwarder YAML Files.
  2. Update the configuration file to add both Prometheus and Densify connection parameters. See Configuring the Data Forwarder.

  3. Test the Data Forwarder functionality by deploying configmap.yml and pod.yml to your Kubernetes cluster. The created pod needs connectivity to your Densify instance either directly or through proxy. See Testing the Data Forwarder.
  4. If the Data Forwarder sent metrics to your Densify instance properly, then schedule cronjob.yml to run the pod regularly. See Scheduling the Data Forwarder .

To learn more watch the following video:

Downloading Data Forwarder YAML Files

  1. Navigate to: https://github.com/densify-dev/Container-Optimization-Data-Forwarder/tree/master/examples/CronJob and save the following files to your local working directory:
    • configmap.yml
    • pod.yml

    • cronjob.yml

Note: Save the raw version of the files to avoid any unwanted characters (i.e. click on the top right Raw button from the GitHub text viewer to open the raw text file in a new browser tab, then save the file to your local working directory).

Configuring the Data Forwarder

  1. Open configmap.yml and edit the parameters outlined in the table below.
  2. All other sections of the configmap.yml file are optional. Depending on your environment, you may need to configure additional parameters in configmap.yml. Refer to the configmap.yml file directly to see descriptions of the additional settings.

Table: Data Forwarder Settings in configmap.yml

Term

Description

Value

Host Definition Section

host

Specify your Densify server host (e.g. myCompany.densify.com). You may need to specify the Densify server’s DNS name, fully qualified domain name (FQDN) or IP address.

<host>

protocol

Specify the protocol to be used to connect to the Densify REST API interface. Select http or https.

https

port

Specify the TCP/IP port used to connect to the Densify server.

You should not need to change this port number. See the Densify URL.

443

endpoint

This is the connection endpoint for the API. You can leave the default value.

/CIRBA/api/v2/

user

Specify the Densify user account that the Data Forwarder will use. This user must already exist in your Densify instance and have API access privileges. Contact [email protected] for the Densify user and epassword required to connect to your Densify instance.

This user will be authenticated by the Densify server.

<user name>

password

Specify the password associated with user, indicated above. Specify the password in plaintext.

epassword

Specify epassword (i.e. encrypted password) for the Densify user. The password must be encryptedand supersedes any value that has been specified in the password field, above.

Typically, the epassword is used. Comment out the password line if it is not used. Typically, [email protected] will provide a Densify username and corresponding epassword when you are setup for container data collection.

<encrypted password>

Specify Settings for Prometheus to use for Data Collection

prometheus_address

Specify the Prometheus address.

Typically, the Data Forwarder is deployed in the same cluster as the Prometheus server, therefore you need to specify the internal service name of Prometheus, in the following format:

<service name>.<namespace>.svc

If the Data Forwarder is not deployed in the same cluster as Prometheus, you need to specify a fully qualified domain name. (i.e. kubemain.int.cirba.com)

<service name>.
<namespace>.svc

prometheus_port

Specify your Prometheus service connection port. The default port is 9090.

Ensure that this port is the web port associated with the Prometheus service name specified in prometheus_address.

9090

prometheus_protocol

Optionally, specify http or https.

https

cluster_name

Optionally, specify a name by which to identify this cluster within Densify. It is highly recommended that you provide a name here for ease of management and reporting.

<cluster_name>

node_group_list

Optionally, specify the nodegroup label reference. You need to uncomment the setting and then specify a value. eg:

node_group_list label_clm_nodepooltype

 

Client Transfer Settings

zipname

Specify a name for the compressed file. Use the cluster name or the Prometheus server name or another name to identify the cluster data. The zipname will be prepended to the transferred file names.

<zip file name>

Configuring the Data Forwarder to Use a Proxy Host

If you need to configure data collection through a proxy server, then you need to specify the connection details in the corresponding section of the file.

The following settings are only applicable if the first setting, <proxyhost> is defined.

Table: Data Forwarder Proxy Host Settings

Term

Description

Value

Proxy Host Definition Section

proxyhost

If defined, then the forwarder will route all Densify traffic through this proxy host. If not defined or blank, then all of the proxy-related parameters below are ignored.

<proxy.host.
com >

proxyport

Specify the port to be used to connect to the proxy host.

443

proxyprotocol

Specify the protocol to be used to communicate between the proxy and the application. Specify http or https.

https

proxyuser

Specify the user name to provide to the proxy, when authentication is required.

If you are using NTLM authentication, do not specify the domain prefix (i.e. specify only ‘user’, not ‘domain\user’)

<user name>

proxypassword

When authentication required, Specify the password associated with proxy user, indicated above. Specify the password in plaintext.

eproxypassword

Specify epassword (i.e. encrypted password) for the Densify user. The password must be encryptedand supersedes any value that has been specified in the password field, above. .

<encryted password>

proxyauth

Specify the authentication scheme to be used by the proxy. Select Basic (username/password) or NTLM. If you select NTLM then you also need to specfiy the proxyserver and proxydomain parameters, below.

If you are using an unauthenticated proxy host, then leave this setting commented out.

Basic

proxyserver

Specify the proxy server name.

proxydomain

Specify the proxy server’s domain.

Testing the Data Forwarder

Once configmap.yml has been updated with your connection parameters, you can create and schedule the pod. A test pod is created via the pod.yml file. This pod contains all of the utilities to connect to Prometheus, collect metrics, compress them and send the compressed data to Densify.

  1. Open a terminal session to run kubectl commands for your Kubernetes cluster.
  2. Optionally, you can create a namespace for the data forwarder and related .YML files. For example, to create the namespace, "collector", execute the following command:
  3. kubectl create namespace collector

    Remember to create all your data collection pods in this namespace to organize your running pods (i.e. append the -n collector option to the commands below).

  4. Create the configmap in the "collector" namespace using the following command:
  5. kubectl create -f configmap.yml

    Note: If you are using a namespace with your data collection pods, add the -n <namespace> option to your kubectl commands. For example: 
    kubectl create -f configmap.yml -n collector

  6. Create the pod using the following command:
  7. kubectl create -f pod.yml -n collector

    This command creates the pod according to the details provided by configmap.yml. The jobs inside the pod collect data, compress and transfer data to your Densify instance. Once the data has been collected and sent to your Densify instance, the pod terminates.

  8. Run the following command to review the pod status:
  9. kubectl get pods -n collector

    The command should return the details of the pod. The status will be similar to:

    densify 0/1 Completed 0 1m

  10. You can use the pod ID to review the log file for the job execution.
  11. kubectl logs densify - n collector

    You can review the content of the log file to verify successful completion of the data collection and transfer of the compressed data file to your Densify instance.

    Copy

    Sample Log File

            will upload file/directory 'data'
            no modifier supplied
            will zip contents of 'data' to 'gke.zip'
            compressing 46 file(s)...
            zipped file: data
            uploading gke.zip; contents of 66 file(s)...
            completed.

    The number of compressed and uploaded files (displayed at the end of the log) is dependent on the types of pods you are running in your cluster and if you have enabled the optional node-exporter component. An indication of a successful data transfer is determined when the number of uploaded files range from 20 to 70.

    If you see 7 files uploaded, then this indicates a transferring problem. You need to review the log to determine the issue.

  12. Optionally, once you have reviewed the logs you can remove the pod, as it is no longer required:
  13. kubectl delete -f pod.yml

Editing configmap.yml

You can make changes to the configmap.yml file as required. After editing the file you must delete and recreate the config map to pick up any changes:

kubectl delete -f configmap.yml -n <namespace>

kubectl create -f configmap.yml -n <namespace>

You do not need to recreate the job or pod files. The next, scheduled run will start sending the updated set of data. If you want to test your changes you can run the pod.yml immediately:

kubectl create -f pod.yml-n <namespace>

Scheduling the Data Forwarder

Schedule the pod to run at the same interval you are using for data collection, as defined in the configmap.yml.

  1. Create the CronJob object using the following command:
  2. kubectl create -f cronjob.yml

    Note: If you created and used a namespace for the configmap.yml, then you need to use the same namespace for the cronjob.yml file. For example: 
    kubectl create -f cronjob.yml -n collector

    Similar to pod.yml, this command creates the pod according to the details provided by configmap.yml. Additionally it creates the cron job that runs the pod hourly, on the hour.

  3. You can use the get pods command to review the last 3 jobs that were executed successfully. If there is a failure for any reason, the pod history is retained.
  4. You can now let your [email protected] know that data forwarder is running.

Note: If you want to configure the Data Forwarder container to use a specific version of the image, refer to https://hub.docker.com/r/densify/container-optimization-data-forwarder/tags for a list of all the available versions. Use the "latest" tag to get the latest version and to get updates as new features are released. If you have issues with using the image from the "latest" tag, then pull the most recent release from the tags list.

Building and Configuring the Data Forwarder

You may need to build your own version of the Data Forwarder container. This may be the case, if your security policies do not allow you to pull the containers from Docker Hub. You can then obtain the required code from GitHub:

https://github.com/densify-dev/Container-Optimization-Data-Forwarder

You can customize the code as required to conform to your security standards or reference your proprietary base images and then build your own version of the Densify Data Forwarder container.