Installation Troubleshooting

This topic provides solutions for some installation issues.

Services in Pending state

Issue: Services are in Pending state after you run kubectl get pods -n xvm.

Reason: Labels for the region and zone nodes are incorrectly configured.

Solution:
  1. Check the label configuration: kubectl get nodes --show-labels.
  2. Remove the incorrect label: kubectl label node <node-name> <label-name>-.
  3. Configure the correct label: kubectl label node <node-name> <label-name>=<value>.

    For example: kubectl label node zonal1-node-xvm region=reg1.

  4. On the control plane VM, perform the following:
    cd /etc/xvm/controlplane_node_binaries
    ./loadPodsInControlPlaneNode.sh
    

Services in ErrImagePull or ImagePullBackOff state

Issue: Services are in ErrImagePull or ImagePullBackOff state after you run kubectl get pods -n xvm.

Reason: You tried to create a pod that references an image name or tag that does not exist. This issue can be caused by a version mismatch or by an incorrect label configuration.

Solution for a version mismatch:
  1. Determine whether the Docker image exists on the region and zone nodes: docker images.
  2. Load the same version of Visibility Manager on all nodes in the cluster.
  3. Load the correct Docker image.
    docker load < /etc/xvm/<ABC>_node_binaries/<docker_image.tar.gz>
Solution for an incorrect label configuration:
  1. Check the label configuration: kubectl get nodes --show-labels.
  2. Remove the incorrect label: kubectl label node <node-name> <label-name>-.
  3. Configure the correct label: kubectl label node <node-name> <label-name>=<value>.

    For example: kubectl label node zonal1-node-xvm region=reg1.

  4. On the control plane VM, perform the following:
    cd /etc/xvm/controlplane_node_binaries
    ./loadPodsInControlPlaneNode.sh
    

crms-ms does not start and pod is in CrashLoopBackOff state

Issue: The crms-ms does not start or does not run correctly and pods are in CrashLoopBackOff state.

Reason: An incorrect parameter may be configured in the /opt/crms/locations.csv file.

Solution:
  1. Correct the locations.csv file on all region nodes as described in Configure and Verify the System.

    Do not insert spaces after commas or in any fields other than the geographical location fields. Ensure that the IP addresses are valid. Zone names and host names can consist of numeric characters and lowercase alphabetic characters. For example:

    usa,region-1,10.37.138.187,east-zone,10.37.138.187,zone-187,Duff,
    36.4467° N,84.0674° W
    usa,region-1,10.37.138.188,west-zone,10.37.138.187,zone-188,Las Vegas,
    36.1699° N,115.1398° W
    
  2. On the control plane VM, perform the following:
    cd /etc/xvm/controlplane_node_binaries
    ./loadPodsInControlPlaneNode.sh
    

Nodes not listed after you run kubectl get nodes

Issue: Nodes are not listed after you run kubectl get nodes on the control plane VM.

Reason: There are two possibilities. An incorrect join token was applied to the region and zone VMs. Or the host name is incorrectly configured on the region and zone VMs. For example, the same host name is used for more than one node in the xvmconf file.

Solution for an incorrect join token:
  1. On the control plane VM, create the join token:
    kubeadm token create --print-join-command
    .
  2. On the nodes that were not listed, remove any existing tokens: kubeadm reset.
  3. Copy the join token on the nodes that were not listed: kubeadm join.

    For more information, see Configure and Verify the System.

  4. Continue with the Configure and Verify the System process after copying the join token.
Solution for an incorrect host name:
  1. Verify the host name on all nodes in the cluster: cat /etc/hosts/.
  2. Correct the xvmconf file in this location: vi /etc/xvmconf.
  3. Rerun the startup script:
    /etc/xvm/<node>_binaries/<node>_startup.sh
  4. Set node labels for each region and zone using the kubectl label nodes <hostname><label> syntax, where <label> is in the following format: reg1 for regions 1, 2, and 3, and reg1-zone1, reg1-zone2 for different zones in a region. For example:
    kubectl label nodes cat-region1-evm region=reg1
    kubectl label nodes cat-region2-evm region=reg1
    kubectl label nodes cat-region3-evm region=reg1
    kubectl label nodes cat-zone1-evm zone=reg1-zone1
    

    The final command differs from the others in two ways. It has a different label and the label value is assigned.

  5. On the control plane VM, perform the following:
    cd /etc/xvm/controlplane_node_binaries
    ./loadPodsInControlPlaneNode.sh
    

Cluster node reconfiguration

You can reconfigure any of the region or zone nodes for reasons such as the following:
  • Change the management IP address
  • Change the host name
  • Change any parameters in the xvmconf file
Take the following steps:
  1. As needed, update the configuration parameters of the xvmconf file at vi /etc/xvmconf.
  2. To delete a region or zone node from the control plane, take the following steps:,
    1. Run the following on the control plane node:
      kubectl cordon <node-name>
      kubectl drain <node-name> --ignore-daemonsets
      kubectl delete node <node-name>
    2. On the region or zone node, run kubeadm reset.

    The drain waits for graceful termination. Do not operate the node's VM until the command completes. To put the node back into service, run kubectl uncordon <node-name>, which makes the node able to be scheduled.

  3. From the node where you updated xvmconf, run the startup script:
    /etc/xvm/<ABC>_node_binaries/<ABC>_startup.sh

    <ABC> can be a region or a zone. For example: /region_startup.sh.

  4. On the control plane VM, create a new join token:
    kubeadm token create --print-join-command

    Alternatively, you can reuse a previously used join token if it is available.

  5. Copy the join token on the region or zone node: kubeadm join.
  6. Verify that the nodes have joined the cluster: kubectl get nodes.
  7. Set node labels for each region and zone using the kubectl label nodes <hostname><label> syntax, where <label> is in the following format: reg1 for regions 1, 2, and 3, and reg1-zone1, reg1-zone2 for different zones in a region. For example:
    kubectl label nodes cat-region1-evm region=reg1
    kubectl label nodes cat-region2-evm region=reg1
    kubectl label nodes cat-region3-evm region=reg1
    kubectl label nodes cat-zone1-evm zone=reg1-zone1
    

    The final command differs from the others in two ways. It has a different label and the label value is assigned.

  8. On any region node, verify that leader election is complete:
    # patronictl -c /opt/app/patroni/etc/postgresql.yml list postgres

    The output identifies the node with the Leader role and displays an * in the Pending Restart column.

  9. On the leader region node, restart the patroni service:
    sudo systemctl restart patroni
  10. On any region node, ensure nothing is in Pending Restart state:
    patronictl -c /opt/app/patroni/etc/postgresql.yml list postgres
  11. On the control plane VM, perform the following:
    cd /etc/xvm/controlplane_node_binaries
    ./loadPodsInControlPlaneNode.sh