Skip to main content

Adding and Customizing Support Bundles

This topic describes how to add a default support bundle specification to a release for your application. It also describes how to optionally customize the default support bundle specification based on your application's needs. For more information about support bundles, see About Preflight Checks and Support Bundles.

The information in this topic applies to Helm chart- and standard manifest-based application installed with the Helm CLI or with Replicated KOTS.

Add the Specification to a Manifest File

This section describes how to add an empty support bundle specification to a manifest file. An empty support bundle specification includes the following collectors by default:

You do not need manually include the clusterInfo or clusterResources collectors in the specification.

After you create this empty support bundle specification, you can test the support bundle by following the instructions in Generating a Support Bundle. You can also optionally customize the support bundle specification by adding collectors and analyzers or editing the default collectors. For more information, see (Optional) Customize the Specification below.

You can add the support bundle specification to a Kubernetes Secret or a SupportBundle custom resource. The type of manifest file that you use depends on your application type (Helm chart- or standard manifest-based) and installation method (Helm CLI or KOTS).

Use the following table to determine which type of manifest file to use for creating a support bundle specification:

Helm CLIKOTS v1.94.2 and LaterKOTS v1.94.1 and Earlier
Helm Chart-Based ApplicationKubernetes SecretKubernetes SecretSupportBundle Custom Resource
Standard Manifest-Based ApplicationN/ASupportBundle Custom ResourceSupportBundle Custom Resource

Kubernetes Secret

You can define support bundle specifications in a Kubernetes Secret for the following installation types:

  • Installations with the Helm CLI
  • Helm chart-based applications installed with KOTS v1.94.2 and later

In your Helm chart templates directory, add the following YAML to a Kubernetes Secret:

apiVersion: v1
kind: Secret
metadata:
labels:
troubleshoot.sh/kind: support-bundle
name: example
stringData:
support-bundle-spec: |
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: support-bundle
spec:
collectors: []
analyzers: []

As shown above, the Secret must include the following:

  • The label troubleshoot.sh/kind: support-bundle
  • A stringData field with a key named support-bundle-spec

(KOTS Only) SupportBundle Custom Resource

You can define support bundle specifications in a SupportBundle custom resource for the following installation types:

  • Standard manifest-based applications installed with KOTS
  • Helm chart-based applications installed with KOTS v1.94.1 and earlier

In a release for your application, add the following YAML to a new support-bundle.yaml manifest file:

apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: example
spec:
collectors: []
analyzers: []

For more information about the SupportBundle custom resource, see Preflight and Support Bundle.

(Optional) Customize the Specification

You can optionally customize the support bundles for your application by:

  • Adding collectors and analyzers
  • Editing or excluding the default clusterInfo and clusterResources collectors

For examples of collectors and analyzers defined in Kubernetes Secrets and Preflight custom resources, see Example Specifications below.

Add Collectors

Collectors gather information from the cluster, the environment, the application, or other sources. Collectors generate output that is then used by the analyzers that you define.

In addition to the default clusterInfo and clusterResources collectors, the Troubleshoot open source project includes several collectors that you can include in the specification to gather more information from the installation environment. To view all the available collectors, see All Collectors in the Troubleshoot documentation.

The following are some recommended collectors:

Add Analyzers

Analyzers use the data from the collectors to generate output for the support bundle. Good analyzers clearly identify failure modes and provide troubleshooting guidance for the user. For example, if you can identify a log message from your database component that indicates a problem, you should write an analyzer that checks for that log and provides a description of the error to the user.

The Troubleshoot open source project includes several analyzers that you can include in the specification. To view all the available analyzers, see the Analyze section of the Troubleshoot documentation.

The following are some recommended analyzers:

Customize the Default clusterResources Collector

You can edit the default clusterResources using the following properties:

  • namespaces: The list of namespaces where the resources and information is collected. If the namespaces key is not specified, then the clusterResources collector defaults to collecting information from all namespaces. The default namespace cannot be removed, but you can specify additional namespaces.

  • ignoreRBAC: When true, the clusterResources collector does not check for RBAC authorization before collecting resource information from each namespace. This is useful when your cluster uses authorization webhooks that do not support SelfSubjectRuleReviews. Defaults to false.

For more information, see Cluster Resources in the Troubleshoot documentation.

The following example shows how to specify the namespaces where the clusterResources collector collects information:

spec:
collectors:
- clusterResources:
namespaces:
- default
- my-app-namespace
ignoreRBAC: true

The following example shows how to use Helm template functions to set the namespace:

spec:
collectors:
- clusterResources:
namespaces: {{ .Release.Namespace }}
ignoreRBAC: true

The following example shows how to use the Replicated Namespace template function to set the namespace:

spec:
collectors:
- clusterResources:
namespaces: '{{repl Namespace }}'
ignoreRBAC: true

For more information, see Namespace in Static Context.

Exclude the Default Collectors

Although Replicated recommends including the default clusterInfo and clusterResources collectors because they collect a large amount of data to help with installation and debugging, you can optionally exclude them.

The following example shows how to exclude both the clusterInfo and clusterResources collectors from your support bundle specification:

spec:
collectors:
- clusterInfo:
exclude: true
- clusterResources:
exclude: true

Example Specifications

This section includes common examples of support bundle specifications. For more examples, see the Troubleshoot example repository in GitHub.

Check API Deployment Status

The examples below use the deploymentStatus analyzer to check the version of Kubernetes running in the cluster. The deploymentStatus analyzer uses data from the default clusterResources collector.

For more information, see Deployment Status and Cluster Resources in the Troubleshoot documentation.

apiVersion: v1
kind: Secret
metadata:
name: example
labels:
troubleshoot.sh/kind: support-bundle
stringData:
support-bundle-spec: |-
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: example
spec:
collectors: []
analyzers:
- deploymentStatus:
name: api
namespace: default
outcomes:
- fail:
when: "< 1"
message: The API deployment does not have any ready replicas.
- warn:
when: "= 1"
message: The API deployment has only a single ready replica.
- pass:
message: There are multiple replicas of the API deployment ready.

Check HTTP Requests

If your application has its own API that serves status, metrics, performance data, and so on, this information can be collected and analyzed.

The examples below use the http collector and the textAnalyze analyzer to check that an HTTP request to the Slack API at https://api.slack.com/methods/api.test made from the cluster returns a successful response of "status": 200,.

For more information, see HTTP and Regular Expression in the Troubleshoot documentation.

apiVersion: v1
kind: Secret
metadata:
name: example
labels:
troubleshoot.sh/kind: support-bundle
stringData:
support-bundle-spec: |-
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: example
spec:
collectors:
- http:
collectorName: slack
get:
url: https://api.slack.com/methods/api.test
analyzers:
- textAnalyze:
checkName: Slack Accessible
fileName: slack.json
regex: '"status": 200,'
outcomes:
- pass:
when: "true"
message: "Can access the Slack API"
- fail:
when: "false"
message: "Cannot access the Slack API. Check that the server can reach the internet and check [status.slack.com](https://status.slack.com)."

Check Kubernetes Version

The examples below use the clusterVersion analyzer to check the version of Kubernetes running in the cluster. The clusterVersion analyzer uses data from the default clusterInfo collector.

For more information, see Cluster Version and Cluster Info in the Troubleshoot documentation.

apiVersion: v1
kind: Secret
metadata:
name: example
labels:
troubleshoot.sh/kind: support-bundle
stringData:
support-bundle-spec: |-
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: example
spec:
collectors: []
analyzers:
- clusterVersion:
outcomes:
- fail:
message: This application relies on kubernetes features only present in 1.16.0
and later.
uri: https://kubernetes.io
when: < 1.16.0
- warn:
message: Your cluster is running a version of kubernetes that is out of support.
uri: https://kubernetes.io
when: < 1.24.0
- pass:
message: Your cluster meets the recommended and quired versions of Kubernetes.

Check Node Resources

The examples below use the nodeResources analyzer to check that the minimum requirements are met for memory, CPU cores, number of nodes, and ephemeral storage. The nodeResources analyzer uses data from the default clusterResources collector.

For more information, see Cluster Resources and Node Resources in the Troubleshoot documentation.

apiVersion: v1
kind: Secret
metadata:
name: example
labels:
troubleshoot.sh/kind: support-bundle
stringData:
support-bundle-spec: |-
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: example
spec:
collectors: []
analyzers:
- nodeResources:
checkName: One node must have 2 GB RAM and 1 CPU Cores
filters:
allocatableMemory: 2Gi
cpuCapacity: "1"
outcomes:
- fail:
when: count() < 1
message: Cannot find a node with sufficient memory and cpu
- pass:
message: Sufficient CPU and memory is available
- nodeResources:
checkName: Must have at least 3 nodes in the cluster
outcomes:
- fail:
when: "count() < 3"
message: This application requires at least 3 nodes
- warn:
when: "count() < 5"
message: This application recommends at last 5 nodes.
- pass:
message: This cluster has enough nodes.
- nodeResources:
checkName: Each node must have at least 40 GB of ephemeral storage
outcomes:
- fail:
when: "min(ephemeralStorageCapacity) < 40Gi"
message: Noees in this cluster do not have at least 40 GB of ephemeral storage.
uri: https://kurl.sh/docs/install-with-kurl/system-requirements
- warn:
when: "min(ephemeralStorageCapacity) < 100Gi"
message: Nodes in this cluster are recommended to have at least 100 GB of ephemeral storage.
uri: https://kurl.sh/docs/install-with-kurl/system-requirements
- pass:
message: The nodes in this cluster have enough ephemeral storage.

Check Node Status

The following examples use the nodeResources analyzers to check the status of the nodes in the cluster. The nodeResources analyzer uses data from the default clusterResources collector.

For more information, see Node Resources and Cluster Resources in the Troubleshoot documentation.

apiVersion: v1
kind: Secret
metadata:
name: example
labels:
troubleshoot.sh/kind: support-bundle
stringData:
support-bundle-spec: |-
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: example
spec:
collectors: []
analyzers:
- nodeResources:
checkName: Node status check
outcomes:
- fail:
when: "nodeCondition(Ready) == False"
message: "Not all nodes are online."
- warn:
when: "nodeCondition(Ready) == Unknown"
message: "Not all nodes are online."
- pass:
message: "All nodes are online."

Collect Logs Using Multiple Selectors

The examples below use the logs collector to collect logs from various Pods where application workloads are running. They also use the textAnalyze collector to analyze the logs for a known error.

For more information, see Pod Logs and Regular Expression in the Troubleshoot documentation.

You can use the selector attribute of the logs collector to find Pods that have the specified labels. Depending on the complexity of an application's labeling schema, you might need a few different declarations of the logs collector, as shown in the examples below. You can include the logs collector as many times as needed.

apiVersion: v1
kind: Secret
metadata:
name: example
labels:
troubleshoot.sh/kind: support-bundle
stringData:
support-bundle-spec: |-
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: example
spec:
collectors:
- logs:
namespace: {{ .Release.Namespace }}
selector:
- app=slackernews-nginx
- logs:
namespace: {{ .Release.Namespace }}
selector:
- app=slackernews-api
- logs:
namespace: {{ .Release.Namespace }}
selector:
- app=slackernews-frontend
- logs:
selector:
- app=postgres
analyzers:
- textAnalyze:
checkName: Axios Errors
fileName: slackernews-frontend-*/slackernews.log
regex: "error - AxiosError"
outcomes:
- pass:
when: "false"
message: "Axios errors not found in logs"
- fail:
when: "true"
message: "Axios errors found in logs"

Collect Logs Using limits

The examples below use the logs collector to collect Pod logs from the Pod where the application is running. These specifications use the limits field to set a maxAge and maxLines to limit the output provided.

For more information, see Pod Logs in the Troubleshoot documentation.

apiVersion: v1
kind: Secret
metadata:
name: example
labels:
troubleshoot.sh/kind: support-bundle
stringData:
support-bundle-spec: |-
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: example
spec:
collectors:
- logs:
selector:
- app.kubernetes.io/name=myapp
namespace: {{ .Release.Namespace }}
limits:
maxAge: 720h
maxLines: 10000

Collect Redis and MySQL Server Information

The following examples use the redis and mysql collectors to collect information about Redis and MySQL servers running in the cluster.

For more information, see Redis and MySQL and in the Troubleshoot documentation.

apiVersion: v1
kind: Secret
metadata:
name: example
labels:
troubleshoot.sh/kind: support-bundle
stringData:
support-bundle-spec: |-
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: example
spec:
collectors:
- mysql:
collectorName: mysql
uri: 'root:my-secret-pw@tcp(localhost:3306)/mysql'
parameters:
- character_set_server
- collation_server
- init_connect
- innodb_file_format
- innodb_large_prefix
- innodb_strict_mode
- log_bin_trust_function_creators
- redis:
collectorName: my-redis
uri: rediss://default:replicated@server:6380

Run and Analyze a Pod

The examples below use the textAnalyze analyzer to check that a command successfully executes in a Pod running in the cluster. The Pod specification is defined in the runPod collector.

For more information, see Run Pods and Regular Expression in the Troubleshoot documentation.

apiVersion: v1
kind: Secret
metadata:
name: example
labels:
troubleshoot.sh/kind: support-bundle
stringData:
support-bundle-spec: |-
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: example
spec:
collectors:
- runPod:
collectorName: "static-hi"
podSpec:
containers:
- name: static-hi
image: alpine:3
command: ["echo", "hi static!"]
analyzers:
- textAnalyze:
checkName: Said hi!
fileName: /static-hi.log
regex: 'hi static'
outcomes:
- fail:
message: Didn't say hi.
- pass:
message: Said hi!