IAM Access in Kubernetes: How to Install Kiam

In the last post, we compared kiam and kube2iam head-to-head. While kube2iam was declared the winner of that comparison, I feel that the case for kiam too compelling, and the setup too complicated, to not share my experience setting it up in production.

Other posts in the IAM blog series

Overview

This post will cover everything you need to get kiam running in production:

Step 1: Create IAM roles
Step 2: Configure cert-manager
Step 3: Annotate resources
Step 4: Deploy kiam server
Step 5: Deploy kiam agent
Step 6: Test
Full kiam deployment

For an overview on the motivation behind the creation of kiam, read this blog post by its creator.

Step 1: Create IAM Roles

The first step to using kiam is to create IAM roles for your pods. Kiam recommends using an IAM role for the server deployment to further control access to your other roles, so we will start with that role. The steps here are mostly regurgitated from the IAM docs page in the kiam github project.

Create a role named kiam_server with the following trust relation and inline policy, where YOUR_MASTER_ROLE is replaced with the arn of the role your master nodes run with.

Trust relation:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "YOUR_MASTER_ROLE"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Inline Policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sts:AssumeRole"
      ],
      "Resource": "*"
    }
  ]
}

Get the arn of the role you just created for the next steps. It should look something like arn:aws:iam::111111111111:role/kiam_server

Now create the roles for your pods. Each role will need a policy that has only the permissions that the pod needs to perform its function e.g. listing s3 objects, writing to DynamoDB, reading from SQS, etc. For each role you create, you need to update the trust relationship so that the kiam_server role you created above can assume the individual pod roles. Replace KIAM_SERVER_ARN with the arn you retrieved previously.

Trust relationship:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    },
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "KIAM_SERVER_ARN"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Next, make sure the master nodes in your cluster are able to assume the kiam_server role. This will allow you to restrict the master node from assuming other roles and prevents pods running on the master from arbitrarily assuming any role.

Inline policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sts:AssumeRole"
      ],
      "Resource": "KIAM_SERVER_ARN"
    }
  ]
}

Step 2: Configure cert-manager

Follow the instructions on this page to install the main cert-manager resources with either regular manifests or helm.

Once cert-manager is configured in its namespace, you can then create your first CA issuer. There are instructions on the cert-manager documentation site, but I will go through these steps here as well to include steps specific to kiam’s TLS setup.

First, generate a CA private key and self-signed certificate:

openssl genrsa -out ca.key 2048
openssl req -x509 -new -nodes -key ca.key -subj "/CN=kiam" -out kiam.cert -days 3650 -reqexts v3_req -extensions v3_ca -out ca.crt

Next, save the CA key pair as a secret in Kubernetes:

kubectl create secret tls kiam-ca-key-pair \
   --cert=ca.crt \
   --key=ca.key \
   --namespace=cert-manager

Create a ClusterIssuer so that certificates can be issued in multiple namespaces using the CA key pair we just created:

apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
  name: kiam-ca-issuer
spec:
  ca:
    secretName: kiam-ca-key-pair

Issue a certificate for the kiam agent:

apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
  name: kiam-agent
  namespace: kube-system
spec:
  secretName: kiam-agent-tls
  issuerRef:
    name: kiam-ca-issuer
    kind: ClusterIssuer
  commonName: kiam

Next, issue a certificate for the server. Since cert-manager does not support IP SANs at this time, we will change the cert to use localhost instead of 127.0.0.1:

apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
  name: kiam-server
  namespace: kube-system
spec:
  secretName: kiam-server-tls
  issuerRef:
    name: kiam-ca-issuer
    kind: ClusterIssuer
  commonName: kiam
  dnsNames:
  - kiam-server
  - kiam-server:443
  - localhost
  - localhost:443
  - localhost:9610

You can check that everything is set up correctly by looking at the secrets created by cert-manager to ensure they exist in the correct namespace:

kubectl -n kube-system get secret kiam-agent-tls -o yaml
kubectl -n kube-system get secret kiam-server-tls -o yaml

Step 3: Annotate Kubernetes resources

Kiam requires that you annotate namespaces and pods for roles to be assumed. The namespace configuration uses a regular expression to limit which roles can be assumed per namespace, and the default is to not allow any roles.

apiVersion: v1
kind: Namespace
metadata:
  name: default
  annotations:
    iam.amazonaws.com/permitted: ".*"

For pods, you just add an annotation in the pod metadata spec. Kiam will automatically detect the base arn for your role using the master’s role, but you can also specify a full arn (beginning with arn:aws:iam) if you need to assume roles in other AWS accounts.

annotations:
   iam.amazonaws.com/role: MY_ROLE_NAME

Step 4: Deploy kiam server

Now you are ready to deploy the kiam server component. First, configure RBAC. The reference file for the DaemonSet and service can be found here but we need to modify it since the default configuration will not work with cert-manager.

First, we change the --cert, --key, and --ca options to point to the file names matching those created by cert-manager. Then, we must change the --server-address from 127.0.0.1:443 to localhost:443 in order for health checks to pass. This is because of the IP SANs issue with cert-manager I mentioned earlier. Next, set the --assume-role-arn flag with the KIAM_SERVER_ARN from earlier so that the server pods will use that role to get credentials for your other roles. Pick a tagged release from here to set as the image tag, since latest should not be used in production. The ssl-certs mounted volume will likely need the host path changed depending on the OS of your Kubernetes masters. Since my cluster was installed using kops on Debian images, the correct hostPath for me was /etc/ssl/certs. All together, we end up with:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  namespace: kube-system
  name: kiam-server
spec:
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: kiam
        role: server
    spec:
      tolerations:
       - key: node-role.kubernetes.io/master
         effect: NoSchedule
      serviceAccountName: kiam-server
      nodeSelector:
        kubernetes.io/role: master
      volumes:
        - name: ssl-certs
          hostPath:
      nodeSelector:
      nodeSelector:
        kubernetes.io/role: master
      volumes:
        - name: ssl-certs
          hostPath:
            path: /etc/ssl/certs
        - name: tls
          secret:
            secretName: kiam-server-tls
      containers:
        - name: kiam
          image: quay.io/uswitch/kiam:b07549acf880e3a064e6679f7147d34738a8b789
          imagePullPolicy: Always
          command:
            - /kiam
          args:
            - server
            - --level=info
            - --bind=0.0.0.0:443
            - --cert=/etc/kiam/tls/tls.crt
            - --key=/etc/kiam/tls/tls.key
            - --ca=/etc/kiam/tls/ca.crt
            - --role-base-arn-autodetect
            - --assume-role-arn=arn:aws:iam::111111111111:role/kiam_server
            - --sync=1m
          volumeMounts:
            - mountPath: /etc/ssl/certs
              name: ssl-certs
            - mountPath: /etc/kiam/tls
              name: tls
          livenessProbe:
            exec:
              command:
              - /kiam
              - health
              - --cert=/etc/kiam/tls/tls.crt
              - --key=/etc/kiam/tls/tls.key
              - --ca=/etc/kiam/tls/ca.crt
              - --server-address=localhost:443
              - --gateway-timeout-creation=1s
              - --timeout=5s
            initialDelaySeconds: 10
            periodSeconds: 10
            timeoutSeconds: 10
          readinessProbe:
            exec:
              command:
              - /kiam
              - health
              - --cert=/etc/kiam/tls/tls.crt
              - --key=/etc/kiam/tls/tls.key
              - --ca=/etc/kiam/tls/ca.crt
              - --server-address=localhost:443
              - --gateway-timeout-creation=1s
              - --timeout=5s
            initialDelaySeconds: 3
            periodSeconds: 10
            timeoutSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: kiam-server
  namespace: kube-system
spec:
  clusterIP: None
  selector:
    app: kiam
    role: server
  ports:
  - name: grpclb
    port: 443
    targetPort: 443
    protocol: TCP

Deploying the kiam server by itself should not cause any changes in your cluster. It is the agent which modifies iptables and will start causing requests for the metadata API to ultimately go to the servers.

Note: If your code uses the AWS Java SDK to make API calls, you must specify the --session-duration flag to be longer than 15 minutes (e.g. 60 minutes). This is because the AWS Java SDK will try to refresh credentials that are expiring within 15 minutes, and the default session duration set by kiam is set to 15 minutes. You can keep up on this issue here and here. If this is not configured correctly, any API call using the AWS Java SDK will attempt to retrieve credentials, putting tons of load on kiam agent, kiam server, and your pods.

Step 5: Deploy kiam agent

Since the agent will modify iptables on your Kubernetes nodes, I would advise adding a node to your cluster that is tainted so you can do a controlled test of the agent and server together. With such a complex setup, there is a high chance that something is configured incorrectly with TLS or the IAM roles, and you will want to be able to handle that without affecting your production workload. So first add a node, and then taint it so other pods will not run on it:

kubectl taint nodes NEW_NODE_NAME kiam=kiam:NoSchedule

Now we can configure the agent component using this reference file. First, we will again update the --cert, --key, and --ca options to point to the file names matching those created by cert-manager. Set the hostPath for the ssl-certs volume mount as you did before, and use the same image tag for the container image as you did in the server config. The --host-interface argument in the command args must be updated to match the interface name for your CNI. A table of the options supported by kiam is on github. Lastly, modify the file by replacing NEW_NODE_NAME with the name of your node so that the agent only runs on our newly-added tainted node so that other nodes will not be affected if you have issues. You should end up with something like:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  namespace: kube-system
  name: kiam-agent
spec:
  template:
    metadata:
      labels:
        app: kiam
        role: agent
    spec:
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      nodeSelector:
        kubernetes.io/role: node
      nodeName: NEW_NODE_NAME
      tolerations:
       - key: kiam
         value: kiam
         effect: NoSchedule
      volumes:
        - name: ssl-certs
          hostPath:
            path: /etc/ssl/certs
        - name: tls
          secret:
            secretName: kiam-agent-tls
        - name: xtables
          hostPath:
            path: /run/xtables.lock
            type: FileOrCreate
      containers:
        - name: kiam
          securityContext:
            capabilities:
              add: ["NET_ADMIN"]
          image: quay.io/uswitch/kiam:b07549acf880e3a064e6679f7147d34738a8b789
          imagePullPolicy: Always
          command:
            - /kiam
          args:
            - agent
            - --iptables
            - --host-interface=cali+
            - --json-log
            - --port=8181
            - --cert=/etc/kiam/tls/agent.pem
            - --key=/etc/kiam/tls/agent-key.pem
            - --ca=/etc/kiam/tls/ca.pem
            - --server-address=kiam-server:443
            - --gateway-timeout-creation=1s
          env:
            - name: HOST_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
          volumeMounts:
            - mountPath: /etc/ssl/certs
              name: ssl-certs
            - mountPath: /etc/kiam/tls
              name: tls
            - mountPath: /var/run/xtables.lock
              name: xtables
          livenessProbe:
            httpGet:
              path: /ping
              port: 8181
            initialDelaySeconds: 3
            periodSeconds: 3

Now you can create the agent and verify that only a single agent is running on your new node. There should be no change to your pods running on other nodes.

Step 6: Test

At this point, you will want to get started with testing that everything works. You can do this by deploying a pod to the quarantine node and then using the AWS CLI to test access to resources in your pod. While you are doing this, check the logs of the kiam agent and server pods to debug any issues you encounter. Here’s an example of a deployment where you can specify a role and then test access:

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: aws-iam-tester
  labels:
    app: aws-iam-tester
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: aws-iam-tester
  template:
    metadata:
      labels:
        app: aws-iam-tester
      annotations:
        iam.amazonaws.com/role: TEST_ROLE_NAME
    spec:
      nodeSelector:
        kubernetes.io/role: node
      nodeName: NEW_NODE_NAME
      tolerations:
       - key: kiam
         value: kiam
         effect: NoSchedule
      containers:
      - name: aws-iam-tester
        image: garland/aws-cli-docker:latest
        imagePullPolicy: Always
        command:
          - /bin/sleep
        args:
          - "3600"
        env:
          - name: AWS_DEFAULT_REGION
            value: us-east-1

The pod will exit after an hour, and you can get use kubectl to get a TTY to the pod:

kubectl exec -it POD_NAME /bin/sh

Once you are satisfied that your roles work, and that the kiam agent and server are correctly set up, you can then deploy the agent to every node.

Full kiam deployment

Remove the nodeName key and kiam:kiam tolerations from your agent DaemonSet to allow it to run on every node. I also recommend modifying the server deployment to only log warning-level messages using the --level=warn command arg, or else you will end up with a very large amount of info level logs in a production set up.

Once the agent is installed on each node, you should roll out an update to critical pods to ensure that those pods begin using kiam for authentication immediately. Check your application logs and the kiam server logs for any IAM errors. If you encounter issues, you can delete the agent from all nodes, which will automatically remove the iptables rule and allow your pods to authenticate in the way they did previously.

As I mentioned in the previous post, I saw a performance drop using kiam. Monitor the pods that use AWS services heavily to see if there is an impact. Based on the number of calls being made between the kiam agent and servers, you may see an increase in cross-AZ traffic in AWS, which is billed. Billing totals are updated at least daily, so check for a few days to make sure there is nothing unusual on that front.

Finally, remove the tainted node that we created for testing, or remove the taint so that pods can be scheduled to it to include it in your cluster.

Conclusion

You should now have kiam running in your production cluster. The kiam setup is very long and very painful, but hopefully you were able to get through it without too much IAM or TLS debugging. I found the #kiam slack channel useful when setting it up in my cluster, and recommend you ask specific implementation questions there.

In the next post, we will cover setting up kube2iam in production. Remember to follow the kiam and cert-manager projects on github to support their efforts.