NSX-T – NCP Integration with Openshift 4.6 – The Easy Way

By | 24. March 2021

Introduction

If you have been following the blog posts on this site, we implemented NSX-T with Openshift 4.4 with NCP’s support for Openshift operators (see https://www.vrealize.it/2020/09/29/nsx-t-ncp-integration-with-openshift-4-4-the-easy-way).

In the meantime, NCP 3.1.1 was released, which supports Openshift 4.6. Since 4.6 involves a new ignition format version, I took that opportunity to refresh this blog, to add configuration steps for putting the API loadbalancing on NSX-T and also add the configuration required for TLS offloading.

The NCP operator is also published on the Redhat Openshift Operator Hub (https://catalog.redhat.com/software/operators/detail/5ef0f362701a9cb8c147cf4b). That makes the installation way more simple, as you will see below.

High-Level Installation Walkthrough

Let’s first review what the high-level tasks are to get it working:

  1. Prepare a small jumphost VM for all the installation tasks and install the required installer files
  2. Prepare the required DNS host entries
  3. Configure NSX-T networking constructs to host the cluster
  4. Deploy a Redhat Container Host image as template within your vSphere environment
  5. Prepare the Openshift install config and modify it for NCP. This will create the cluster manifests and ignition files.
  6. Deploy an Openshift cluster as user-provided infrastructure with bootstrap, control-plane and compute hosts using Terraform
  7. Let the bootstrap host provision the cluster and finalize the remaining cluster deployment.

Detailed Installation Walkthrough

1. Jumphost Preparation and Pre-Requisites

For my lab, I have downloaded a CentOS 7.8 minimal ISO and created a VM based on it. If you like, you can grab the ISO here: http://isoredirect.centos.org/centos/7/isos/x86_64/, but any other linux-based VM should work as well.

As we are going to use a couple of scripts and Terraform as well, it makes sense to have at least Python and Terraform installed:

sudo yum install python-pip
sudo yum install unzip
sudo yum install wget
export TERRAFORM_VERSION=0.13.6
curl -O -L https://releases.hashicorp.com/terraform/${TERRAFORM_VERSION}/terraform_${TERRAFORM_VERSION}_linux_amd64.zip
unzip terraform${TERRAFORM_VERSION}_linux_amd64.zip -d ~/bin/
terraform -v
Terraform v0.13.6

To keep things tidy, let’s create a directory structure for the Openshift deployment. You don’t have to, but since you might want to deploy separate deployments, it makes sense to have at least one directory for each deployment:

[localadmin@oc-jumphost ~]$ tree openshift/ -L 1
openshift/
├── config-files
├── deployments
├── downloads
├── installer-files
└── scripts

Download the following items to the downloads folder, extract them into the install-files directory, and move the clients and installer to your binary folder (At the time of this writing, the current version of Openshift 4.6 is 4.6.8, so that is what I have used for the installer and clients). We are also going to use the terraform installer for the vSphere UPI deployment. In the previous posts, I used the terraform scripts from the official openshift-install git repository (https://github.com/openshift/installer.git), but I encountered 2 problems with it:

  • Openshift 4.6 uses a new ignition file format (3.1), which requires to restructure the bootstrap URL configuration. “Append” is not possible anymore, instead you have to use “Merge”. A good explanation can be found here: https://medium.com/@ganshug/redhat-openshift-container-platform-v4-6-is-almost-here-3eea4c1a974d. This means that the older terraform scripts don’t work any more.
  • The newer terraform scripts on the Openshift git repository don’t work well on the vSphere UPI infrastructure. There are IPAM, DNS and LB requirements that don’t match a local setup, so you would have to change a lot in those scripts.

Long story short, I moved my installation over to use a terraform script from Alex Kretzschmar. The repository is located at https://github.com/IronicBadger/ocp4 and easy to customize to the corresponding environment.

cd openshift/downloads
wget -c https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.6.8/openshift-install-linux.tar.gz
wget -c https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.6.8/openshift-client-linux.tar.gz
cd ../installer-files
tar -xf ../downloads/openshift-client-linux.tar.gz
tar -xf ../downloads/openshift-install-linux.tar.gz
sudo cp {oc,kubectl,openshift-install} /usr/bin/
git clone -b https://github.com/IronicBadger/ocp4

Now, you should have the openshift installer and kubectl commands available.
Next step is to create ssh keys, as we will need them to ssh to the RHCOS container hosts:

ssh-keygen -t rsa -b 4096 -N '' -f ~/.ssh/id_rsa
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_rsa

Next, we also need the RHCOS 4.6 OVA and the NSX-T NCP containers.
Download the RHCOS OVA from here:
https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.6/4.6.8/rhcos-4.6.8-x86_64-vmware.x86_64.ova
Most likely, you might want to download it to a location from where you have access to upload the OVA to your VMware vCenter.
As for NSX-T NCP container, you need a myVMware account and you can download it from here:
https://my.vmware.com/en/web/vmware/downloads/details?downloadGroup=NSX-T-PKS-311&productId=982.

We will also need the configuration files for the NCP network operator. These can be downloaded using git from this location:

cd ~/openshift/installer-files/
git clone https://github.com/vmware/nsx-container-plugin-operator.git

Put the ncp container image into the download folder as well and extract it to the installer folder. In the ncp folder, we are going to need the ncp Redhat container image, the other items are not needed, so we can remove a couple of files as well.

cd ~/openshift/installer-files/
unzip ../downloads/nsx-container-3.1.1.17559186.zip
rm -r nsx-container-3.1.1.17559186/PAS/
rm -r nsx-container-3.1.1.17559186/OpenvSwitch/
rm nsx-container-3.1.1.17559186/Kubernetes/nsx-ncp-ubuntu-3.1.1.17559186.tar
rm nsx-container-3.1.1.17559186/Kubernetes/nsx-ncp-photon-3.1.1.17559186.tar

During the Openshift installation process, the ncp operator container image will be automatically downloaded as the image is public available on docker hub, but the NCP container image is required as well, which is not public available. Therefore, you will have to provide the ncp container image on a private image registry, or temporarily deploy it on a private docker hub location.
In my case, I already have a private image registry running, based on Harbor (see https://goharbor.io/), so I placed the NCP image there:

cd ~/openshift/installer-files/nsx-container-3.1.1.17559186/Kubernetes/
docker image load -i nsx-ncp-rhel-3.1.1.17559186.tar
docker tag registry.local/3.1.1.17559186/nsx-ncp-rhel harbor.corp.local/library/nsx-ncp
docker push harbor.corp.local/library/nsx-ncp

Last, we need to get a pull-secret from Redhat, which will allow the container hosts to download the needed containers during the deployment. The pull secret requires a Redhat account (you might as well register for a developer account for free, if you don’t have a corporate subscription).
Go to https://cloud.redhat.com/openshift/install/vsphere/user-provisioned and download your pull secret:

As a preparation, I also strongly recommend to create a TLS certificate for the openshift apps. If you don’t do this up-front, you can’t provide the certificate during the installation. This means that all the openshift routes for the openshift apps (like Console, Prometheus etc.) will not be placed on the NCP loadbalancer, because NCP doesn’t create a self-signed certificate automatically.

To create this certificate, you can use openssl. The certificate SAN needs to point to the wildcard cluster domain. As you can see below, my apps domain URL is *.apps.openshift4.corp.local. Here are the commands required to generate this certificate:

export COMMONNAME=*.openshift4.corp.local
openssl req -newkey rsa:2048 -x509 -nodes -keyout openshift.key -new -out openshift.crt -subj /CN=$COMMONNAME -reqexts SAN -extensions SAN -config <(cat ./openshift-cert.cnf <(printf "[SAN]\nsubjectAltName=DNS:$COMMONNAME")) -sha256 -days 365
openssl x509 -in openshift.crt -text -noout

The command above will generate a self-signed certificate, save it in file openshift.crt and save the key to openshift.key, based on the input variables from the file openshift-cert.cnf. The cnf-file can be prepared before and takes whatever you would like to put into the cert. Mine looks like this:

[ req ]
default_bits = 4096
distinguished_name = req_distinguished_name
req_extensions = req_ext
prompt = no

[ req_distinguished_name ]

countryName = DE
stateOrProvinceName = BW
localityName = Stuttgart
organizationName = NSX
commonName = *.openshift4.corp.local

[ req_ext ]

subjectAltName = @alt_names

[alt_names]

SIDENOTE: Take a look at the resulting certificate. Newer versions of OpenSSL automatically generate self-signed certificates with option basicConstraints=CA:TRUE. That means it generates a CA certificate, which is not what we want, because NSX-T will deny that certificate as server certificate. If you OpenSSL has that option set, you have to revert it in the cnf-file.

2. DNS Preparation

Let’s first take a look at what we are planning to deploy. The default set consists of 3 control-plane nodes and 3 compute nodes. As we are going to use the user-provisioned way of deploying the cluster in vSphere, we also need to take care of the DNS entries.
We are also going to use the NSX-T infrastructure for all possible elements, like network, loadbalancing and DHCP Server, except for DNS, which is most likely already existing in your environment. Our final topology will be looking like this (during bootstrap, one more VM is needed, called bootstrap):

Openshift expects each deployment to have separate cluster id, which needs to correlate with the respective DNS zone. So in my example, my base DNS domain is corp.local. My Openshift cluster name will be openshift4.
Therefore, I have to create DNS entries for each node (control-plane-0 – 2, compute-0 – 2, bootstrap) in a DNS zone called openshift4.corp.local.
In addition, we need to create records for etcd hosts, for openshift API and also service-records for the etcd service. Here’s the complete list of DNS records that are needed:

control-plane-0.openshift4.corp.local 172.16.170.100
control-plane-1.openshift4.corp.local 172.16.170.101
control-plane-2.openshift4.corp.local 172.16.170.102

compute-0.openshift4.corp.local 172.16.170.110
compute-1.openshift4.corp.local 172.16.170.111
compute-2.openshift4.corp.local 172.16.170.112

bootstrap.openshift4.corp.local 172.16.170.99

etcd-0.openshift4.corp.local 172.16.170.100
etcd-1.openshift4.corp.local 172.16.170.101
etcd-2.openshift4.corp.local 172.16.170.102

The following 2 entries point to the bootstrap host during the bootstrap deployment:

api.openshift4.corp.local 172.16.10.100
api-int.openshift4.corp.local 172.16.10.100

A wildcard DNS entry needs to be in place for the OpenShift 4 ingress router, which is also a load balanced endpoint.

*.apps.openshift4.corp.local 172.16.172.1

In addition, you’ll also need to add SRV records. 

_etcd-server-ssl._tcp.openshift4.corp.local

0 10 2380 etcd-0.openshift4.corp.local.

0 10 2380 etcd-1.openshift4.corp.local.

0 10 2380 etcd-2.openshift4.corp.local.

Other than the previous blog, I will not use DNS to loadbalance the API records. Instead, the DNS record for api.openshift4.corp.local and api-int.openshift4.corp.local point to the virtual IP address of an NSX-T load balancer service. You will see in section 3 how it is configured. The benefit is that we don’t need a separate HA-Proxy VM, because the functionality is already included in NSX-T.

As for the DNS entry for *.apps.openshift4.corp.local, the IP address refers to the first IP address from the Ingress IP Pool that we will configure in step 3. NCP will take over the Ingress-LB for the openshift apps and will take the first one from the pool for the newly created cluster. If you are not sure yet, you can omit the DNS entry until the Ingress-LB is created on NSX-T during the installation.

3. Configure NSX-T networking constructs to host the cluster

Let’s refer to the topology:

This image has an empty alt attribute; its file name is image-1.png

In NSX-T, we will create a base topology where the cluster hosts will be attached to. For that, we create a separate T1-Router where all OCP segments will be attached to. We will also create a segment where the hosts will be attached to. Last, a DHCP server will be created for the cluster hosts to get dynamic IP adresses during bootup.
As an optional exercice, I have also created an Ingress-IP-Pool and Egress-NAT-Pool for NCP to consume. This can be done dynamically by NCP as well, but I prefer the pre-provisioned way to be on the safe side.

Assuming you have configured a T0-router already and deployed NSX-T on the vSphere cluster already, let me quickly walk you through the creation of the components above:

Configure T1 for OCP Hosts
– Log in to NSX-T Manager
– Click on the Networking tab
– Connectivity > Tier-1 Gateways
– Add Tier-1 Gateway

Configure Segment for OCP Hosts
– Click on the Networking tab
– Connectivity > Segments
– Add Segment

Configure DHCP Server
– Click on the Networking tab
– IP Management > DHCP
– Add DHCP Profile

Attach the DHCP Server to the OCP-Management segment
– Click on the Networking tab
– Connectivity > Segments
– click edit on the OCP-Management segment
– click edit DHCP config

Configure Ingress IP Pool and Egress NAT Pool
– Click on the Networking tab
– IP Management -> IP Address Pools
– add 2 IP Address Pools

Just make sure that you have configured your T1 propagation settings correctly (advertising Connected Segments, NAT and LB IPs) and verify what your redistribution settings for T0 are. If you use BGP routing, you need to advertise the corresponding settings as well.

Configure Loadbalancing for API and Machine Config Server
Technically, you can run the installation just using DNS loadbalancing for Openshift API, but you will miss out the monitoring and configuration part. Therefore, it makes sense to also put the load balancing on NSX-T. Best practice would be to configure a separate T1-Router for load balancing to reduce the load on the traffic T1.

  • Configure 3 Groups for Openshift bootstrap, compute and control-planes and one group combining bootstrap and control-planes:
  • Configure a Loadbalancer on a different T1-Router.
  • Configure 2 Virtual Services
    • 1 Service running on port 6443 for the openshift api, pointing to a new server group, consisting of the combined bootstrap/contol-plane nodes
    • 1 Service running on port 22623 for the machine config service, pointing again to a new server group, consisting of the combined bootstrap/control plane nodes.
Loadbalancer
Virtual Service 1
Virtual Service 2
Server Pools
Active Monitor for ocp-control-and-boot API
Active Monitor for ocp-control-and-boot machine

I didn’t really explain it, but you need to pick a virtual IP address for this service. This will be from a different IP range, since it will be configured on a different T1 router. Be sure to select the corresponding advertisement from T1 to T0 router for LB. For reference, my virtual service uses IP Address 172.16.10.100, which is also where I have pointed the DNS record for api.openshift4.corp.local to.

4. Deploy a Redhat Container Host image as template within your vSphere environment

After all these preparation steps, we can now start to get things up and running on the vSphere side.
Upload the RHCOS OVA from step 1 into your vCenter. I guess you know how to do it, just navigate to the vCenter WebUI, choose the vSphere cluster where you would like to deploy the VM and click on Deploy OVF template.
During the OVF Template wizard, just be sure to select the OCP Management Segment that we just created on NSX-T as destination network:

You can leave the settings regarding ignition empty, as these will be configured by the terraform installer later.

After the VM has been uploaded, convert it to a template and name the template “rhcos-4.4” (we need to refer to that name for the terraform installer).

5. Prepare the Openshift install config and modify it for NCP

In this step, we are going to configure the openshift installation files on your linux jumphost that we prepared in step 1.
Referring to the directory structure, move to directory openshift/config-files and create a install-config.yaml file.

[localadmin@oc-jumphost ~]$ tree openshift/ -L 1
openshift/
├── config-files
├── deployments
├── downloads
├── installer-files
└── scripts
[localadmin@oc-jumphost ~]$ cd ~/openshift/config-files/

Here’s what my install-config.yaml looks like:

apiVersion: v1
baseDomain: corp.local
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 0
controlPlane:
  hyperthreading: Enabled
  name: control-plane
  replicas: 3
metadata:
  name: openshift4
networking:
  networkType: ncp
  clusterNetwork:
  - cidr: 10.4.0.0/16
    hostPrefix: 23
  machineCIDR: 172.16.170.0/24
  serviceNetwork:
  - 172.30.0.0/16
platform:
  vsphere:
    vcenter: vcsa-01a.corp.local
    username: administrator@corp.local
    password: VMware1!
    datacenter: DC-SiteA
    defaultDatastore: ds-site-a-nfs03
fips: false
pullSecret: 'ENTER YOUR PULL-SECRET HERE'
sshKey: 'ENTER YOUR SSH KEY HERE'
proxy:
additionalTrustBundle: | 
    -----BEGIN CERTIFICATE-----
    'ENTER YOUR REGISTRY CA CERT HERE'
    -----END CERTIFICATE-----
Couple of comments regarding these settings:
Compute replicas: 0As we provide the VMs ourselves, we can choose 0 here.
clusterNetworkthis is the pod network that will be deployed through NCP for the internal pod communication.
machineCIDRthis needs to match with the OCP Segment IP Range that we configured on NSX-T (in this case: 172.16.170.0/24)
passwordenter your vSphere Password here
pullSecretenter the Redhat pull secret that you obtained in step 1. Make sure you put it in ‘
sshKeyenter the contents of your ~/.ssh/id_rsa.pub file from step 1. Make sure you put it in ‘
proxy:Only needed if you deploy the ncp container image from a private registry. As of Openshift 4.4, the only way to provide additional trusted CA certificates is through the proxy configuration, even if the proxy setting itself is empty.
You can remove the proxy setting if you deploy the ncp container image on the public docker hub.
additionalTrustBundle:Only needed if you deploy the ncp container image from a private registry. Here, you enter the CA cert that can verify the private registry server certificate (in my case, the CA cert that signed the server certificate for harbor.corp.local). This is needed, otherwise the NCP download will fail since the openshift hosts can’t validate the private registry certificate.
You can remove the additionalTrustBundle setting if you deploy the ncp container image on the public docker hub.

Next step is to prepare the NCP operator config files accordingly. These are located in the deploy/openshift4 folder of the ncp-operator git directory.

[localadmin@oc-jumphost config-files]$ cd ~/openshift/installer-files/nsx-container-plugin-operator/deploy/openshift4
[localadmin@oc-jumphost deploy]$ ls
configmap.yaml
namespace.yaml
operator.nsx.vmware.com_ncpinstalls_crd.yaml   
operator.yaml      
role.yaml
lb-secret.yaml  
nsx-secret.yaml  operator.nsx.vmware.com_v1_ncpinstall_cr.yaml  role_binding.yaml  
service_account.yaml

With the operator support, we only need to modify 3 files:

Modify configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: nsx-ncp-operator-config
  namespace: nsx-system-operator
data:
  ncp.ini: |

    [vc]

    [coe]


    adaptor = openshift4

    cluster = openshift4

    loglevel = WARNING

    nsxlib_loglevel = WARNING

    enable_snat = True

    [DEFAULT]


    [nsx_v3]


    policy_nsxapi = True

    nsx_api_managers = 192.168.110.200

    nsx_api_user = admin

    nsx_api_password = ENTER_YOUR_NSX_PW_HERE

    insecure = True

    subnet_prefix = 24

    log_dropped_traffic = True

    log_firewall_traffic = DENY

    use_native_loadbalancer = True

    l4_lb_auto_scaling = True

    pool_algorithm = WEIGHTED_ROUND_ROBIN

    service_size = SMALL

    external_ip_pools = oc4-external-ip-pool

    top_tier_router = T1-OCP

    single_tier_topology = True

    external_ip_pools_lb = oc4-external-lb-pool

    overlay_tz = 1b3a2f36-bfd1-443e-a0f6-4de01abc963e

    edge_cluster = a3a95653-5d1c-44b2-86e4-d9a279370618


    [ha]


    [k8s]


    apiserver_host_ip = api-int.openshift4.corp.local

    apiserver_host_port = 6443

    client_token_file = /var/run/secrets/kubernetes.io/serviceaccount/token

    ca_file = /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

    loglevel = WARNING


    [nsx_kube_proxy]


    [nsx_node_agent]

    ovs_bridge = br-int

    ovs_uplink_port = ens192

All the other settings are commented out, so NCP takes the default values for everything else. If you are interested in all the settings, the original file in the directory is quite large and has each config item explained.

Couple of comments regarding these settings:
nsx_api_passwordPut the NSX admin user password here
overlay_tzPut the UUID of the Overlay-Transport-Zone here
service_sizeFor PoC, having a small LB deployed should be fine. For production deployment, you would rather want to use medium or large LB.

Modify operator.yaml. The only thing you need to modify here is the location where you have placed the NCP image.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nsx-ncp-operator
  namespace: nsx-system-operator
spec:
  replicas: 1
  selector:
    matchLabels:
      name: nsx-ncp-operator
  template:
    metadata:
      labels:
        name: nsx-ncp-operator
    spec:
      hostNetwork: true
      serviceAccountName: nsx-ncp-operator
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
      - effect: NoSchedule
        key: node.kubernetes.io/not-ready
      containers:
        - name: nsx-ncp-operator
          image: vmware/nsx-container-plugin-operator:latest
          command: ["/bin/bash", "-c", "nsx-ncp-operator --zap-time-encoding=iso8601"]
          imagePullPolicy: Always
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: OPERATOR_NAME
              value: "nsx-ncp-operator"
            - name: NCP_IMAGE
              value: "harbor.corp.local/library/nsx-ncp:latest"
            - name: WATCH_NAMESPACE
              value: "nsx-system-operator"

Modify lb-secret.yaml. In this file, you place the certificate you created in step 1 for the openshift apps. This will enable NCP to put the certificate as Ingress LB certificate and build up the corresponding route configurations. Please be aware that certificate and key entries are expected as base64. So you might want to first convert the certificate as follows:

base64 -w0 openshift.crt
base64 -w0 openshift.key

You take those printouts and put them into the lb-secret-yaml:

apiVersion: v1
data: 
  tls.crt: <<COPY THE BASE64 CRT FILE IN HERE>>
  tls.key: <<COPY THE BASE64 KEY FILE IN HERE>>
kind: Secret
metadata: {name: lb-secret, namespace: nsx-system-operator}
type: kubernetes.io/tls

Now, we are ready to create the openshift installer manifests and ignition files. For each deployment, the openshift installer will create files in a specific folder structure. So let’s create a new directory for this deployment and copy the install-config.yaml into that folder.

cd ~/openshift/deployments/
mkdir ncp-oc4-vsphere
cp ../config-files/install-config.yaml ncp-oc4-vsphere/

With the next step, we create the openshift manifests:

openshift-install create manifests --dir=ncp-oc4-vsphere

Depending on whether you would like to have pods scheduled on the control-plane nodes, the openshift docs suggest you do the following:
nano ncp-oc4-vsphere/manifests/cluster-scheduler-02-config.yml
Set mastersScheduleable: false. You can also remove the machine sets, because we use a user-provisioned environment

sed -i 's/mastersSchedulable: true/mastersSchedulable: false/g' ncp-oc4-vsphere/manifests/cluster-scheduler-02-config.yml
rm ncp-oc4-vsphere/99_openshift-cluster-api_worker-machineset-0.yaml
rm ncp-oc4-vsphere/99_openshift-cluster-api_master-machines-0.yaml
rm ncp-oc4-vsphere/99_openshift-cluster-api_master-machines-1.yaml
rm ncp-oc4-vsphere/99_openshift-cluster-api_master-machines-2.yaml

Next we need to move the NCP operator config files into the manifest folder and then create the ignition configs:

cp ../installer-files/nsx-container-plugin-operator/deploy/openshift4/*.yaml ncp-oc4-vsphere/manifest
openshift-install create ignition-configs --dir=ncp-oc4-vsphere

If you now take a look in the ncp-oc4-vsphere folder, you find 3 important files: bootstrap.ign, master.ign and worker.ign. We need to do a couple of things with these files, as they are required to bring the bootstap cluster up and running.
-> bootstrap.ign: This file needs to be positioned on a web server and be reachable from the bootstrap host that we are going to deploy. You are free to pick whatever web server you like.
-> master.ign: The file contents are needed in the next step in the terraform deployment. Copy it to folder ~/openshift/installer-files/ocp4/clusters/4.6/
-> worker.ign: The file contents are needed in the next step in the terraform deployment. Copy it to folder ~/openshift/installer-files/ocp4/clusters/4.6/

Important Notes:
(1) The Openshift installer includes a certificate in these ign files for the initial deployment. That certificate is only valid for 24 hours. If you don’t get your cluster up and running within 24 hours, you need to generate new manifests and ignition configs.

(2) If you have to start over again from a previous deployment, you can simply delete contents of the ncp-oc4-vsphere folder, but there are 2 hidden files: .openshift_install.log and .openshift_install_state.json where Openshift keeps installation information. Unless you also delete these two files, the certificates will not be renewed.

6. Deploy an Openshift cluster as user-provided infrastructure with bootstrap, control-plane and compute hosts using Terraform

We are now ready to deploy the bootstrap, control-plane and compute nodes to our vSphere environment and we will use terraform to adjust all the settings for us. To that end, we need to tell terraform what it needs to do. Let’s move into the vsphere upi folder of the openshift installer.

cd ~/openshift/installer-files/ocp4/clusters/4.6/

In this folder, we need to create a file called terraform.tfvars

## Node IPs
loadbalancer_ip = "192.168.5.160"
coredns_ip = "192.168.110.10"
bootstrap_ip = "172.16.170.99"
master_ips = ["172.16.170.100", "172.16.170.101", "172.16.170.102"]
worker_ips = ["172.16.170.110", "172.16.170.111"]

## Cluster configuration
rhcos_template = "rhcos-4.6"
cluster_slug = "openshift4"
cluster_domain = "corp.local"
machine_cidr = "172.16.170.0/24"
netmask ="255.255.255.0"

## DNS
local_dns = "192.168.110.10" # probably the same as coredns_ip
public_dns = "192.168.110.10" # e.g. 1.1.1.1
gateway = "172.16.170.1"

## Ignition paths
## Expects `openshift-install create ignition-configs` to have been run
## probably via generate-configs.sh
bootstrap_ignition_path = "bootstrap.ign"
master_ignition_path = "master.ign"
worker_ignition_path = "worker.ign"
 
Couple of comments regarding these settings:
loadbalancer_IPjust a dummy address. We will uncomment the HA Proxy config.
bootstrap_ip, control_plane_ips, compute_ipsFor my setup, I have chosen to use static IP adresses for the nodes. Even if you do this, you still need the DHCP server in step 3 to hand out dynamic IP adresses during RHCOS bootup. If you don’t want to use static IP adresses, you can also configure static MAC binding in the NSX-T DHCP server to achieve the same.
bootstrap_ignition_pathpath to your bootstrap file. See below on more details
master_ignition_pathpath to your master ignition file. We will copy the generated files into this folder, so just use the local path
worker_ignition_pathpath to your worker ignition file. We will copy the generated files into this folder, so just use the local path
vm_templatethis needs to match how you named the RHCOS template in step 4

Next, modify the main.tf file. Comment out the modules lb, lb_vm, coredns, and dns_vm:

...
/*
module "lb" {
  source = "../../modules/ignition_haproxy"
...
...
...
module "lb_vm" {
...
...
...
module "coredns" {
...
...
module "dns_vm" {
...
...
}

*/

Next, create a file vsphere.yaml, where you put in your vSphere credentials

vsphere-user: administrator@corp.local
vsphere-password: "XXXXX"
vsphere-server: vcsa-01a.corp.local
vsphere-dc: DC-SiteA
vsphere-cluster: Compute-Cluster

Then, create a file called bootstrap.ign. This file will point the bootstrap server to the HTTP server where you will put the generated bootstrap file, so modify the URL according to your setup.

{
  "ignition": {
    "config": {
      "merge": [
        {
          "source": "http://192.168.110.12/bootstrap.ign" 
        }
      ]
    },
    "version": "3.1.0"
  }
}

Last (not required, but was necessary in my setup), modify the file modules/rhcos-static/main.tf to have a longer time-out for the VM cloning. Terraform has a default setting of 30 Minutes for the cloning. In my lab, it took longer, which made Terraform abort. So I have set it to 120 minutes to be on the safe side.

...  
  clone {
    template_uuid = var.template
    timeout = 120
  }
...

Finally, let’s get things rolling and make Terraform deploy the nodes:

cd ~/openshift/installer-files/ocp4/clusters/4.6/
terraform init
terraform apply -auto-approve

In your vSphere environment, you should now see cloning tasks spawning. Eventually, you will see there these items:

Sidenote: In case something went wrong or the deployment did not succeed, you have the option to rollback. Use the command terraform destroy -auto-approve to remove those items. If you need to restart the terraform deployment again, first delete all the terraform.tfstate* files in the upi/vsphere directory.

Compared to the previous installation without operator support, there are no further steps to be taken. We don’t need to place the ncp images manually and we don’t need to tag any VMs in NSX-T anymore. This is all done automatically through the operator.

7. Let the bootstrap host provision the cluster and finalize the remaining cluster deployment.

We are pretty close now. First, the bootstrap node will start deploying the openshift cluster on the control-plane nodes. We can monitor that process with the following command:

cd ~/openshift/deployments/
openshift-install wait-for bootstrap-complete --dir=ncp-oc4-vsphere --log-level debug

Let’s wait now until the openshift installer signals that the bootstrap process is complete:

DEBUG Bootstrap status: complete
INFO It is now safe to remove the bootstrap resources

You can now remove the bootstrap node through terraform:

cd ~/openshift/installer-files/installer/upi/vsphere/
terraform apply -auto-approve -var 'bootstrap_complete=true'

As this point, take a look at the NSX-T loadbalancer. The loadbalancing entries are automatically adjusted, because we use grouping. As soon as the bootstrap is removed, the load balancing entries will only point to the master nodes.

Let’s finalize the deployment:

cd ~/openshift/deployments/
openshift-install --dir=ncp-oc4-vsphere/ wait-for install-complete --log-level=DEBUG

There are a couple of commands that you can use during the installation phase to see details on the progress:

export KUBECONFIG=~/openshift/deployments/ncp-oc4-vsphere/auth/kubeconfig
oc get nodes
oc project nsx-system
oc get pods   (this should show you all NCP pods)
watch -n5 oc get clusteroperators

As NCP fires up, it implements all the required networks and loadbalancers in NSX-T for this installation. In segments, you should find a segment for each Openshift project. If all the operators are running, there should be 49 segments (including the OCP-Management segment).

In Loadbalancers, there are now 2 Ingress-Loadbalancers deployed as well. NCP has auto-allocated an IP adress from the LB-Pool for it.

DONE!!

(well, almost. You need to tell Openshift things about image registry and where to find storage in your vSphere cluster. Please refer to https://docs.openshift.com/container-platform/4.4/installing/installing_vsphere/installing-vsphere-network-customizations.html#installation-vsphere-config-yaml_installing-vsphere-network-customizations. I did the following:

Tell OC that image registry is managed

oc project openshift-image-registry
oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"managementState": "Managed"}}'

Fake image repository for PoCs
oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"storage":{"emptyDir":{}}}}'

Further Links

I focussed in this blog on the NSX-T integration part. Therefore, I did not elaborate any further on Openshift specifics or config variables. If you like to drill-down further, or use HA-Proxy to handle the API-LB, here are a couple of links:


https://docs.openshift.com/container-platform/4.4/installing/installing_vsphere/installing-vsphere-network-customizations.html

https://labs.consol.de/container/platform/openshift/2020/01/31/ocp43-installation-vmware.html

https://github.com/yasensim/nsxt-ocp4

print

4 thoughts on “NSX-T – NCP Integration with Openshift 4.6 – The Easy Way

  1. Travis

    I have been trying to replicate this with OKD (tried 4.4, 4.5, 4.6 and 4.7) using this article and your previous article as a basis. I am able to bootstrap the cluster (using terraform UPI from okd repo). Regardless of any configuration steps I take, the deployment will end up stuck due to the a number of the ncp pods pending or crashlooping.

    I am quite curious if you are able to (or have previously) verified this works with OKD.

    According to OKD docs, NCP / NSX-T is supported, but according to NSX-T / NCP documents, it is only supported on RHCOS and it does not list FCOS as a supported OS.

    Any advice is appreciated.

    Reply
  2. dokyung

    Hi, i have a qustion.

    let me know, i have a harbor certificate issue,

    Isn’t the information on the harbor image out of this information? And what are the important issues of additional trust?

    Reply
    1. Jörg Walz Post author

      The NCP images location needs to be positioned on a registry, as it isn’t provided on a public location. So I have put the NCP image on a harbor registry. What you need to do is tell the Openshift installer about the certificate of your harbor installation, because otherwise it will through a certificate unknown issue during the container download. The certificate of the harbor registry needs to be placed in the “additionalTrustBundle” section of the install-config.yaml file.

      Reply
  3. Pingback: NSX-T – NCP Integration with Openshift 4.8 – The Super-Easy Way » vrealize.it - TechBlog VMware SDDC

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.