How do I fix ErrImagePull while pulling pod's images from ACR in AKS?

As an engineer I want to pull container images that are part of my pods' deployments to Azure Kubernetes Service (further - AKS) from private container registry Azure Container Registry (further - ACR). I have created Azure Resource Group (further - RG), AKS and ACR using Azure CLI with the following commands (here is a little bash script instead for create / delete resources):

Assuming, that az cli is already installed on the system and you are logged in with az login. Verify by az account list to list logged in accounts and set marker to active so you will not have surprises where the following resources get created. If it is the case, then I've got this little bash script that will help to roll out and roll back everything that we need to follow along this article (skip this part if you have already AKS and ACR up and running). Also for az cli installation check this official documentation.



#!/usr/bin/env bash

AKS=AKS_NAME
RG=RESOURCE GROUP
LOCATION=LOCATION
ACR=ACR_NAME
NODE_COUNT=1
SKU=Basic

actionchooser() {
    select action in "Create resources" "Delete resources" "Exit"
    do
        case $action in
            "Create resources") create;;
            "Delete resources") delete;;
            "Exit") exit;;
            *) echo "Please use the option from the menu";;
        esac
    done
}

create() {
    az group create -n $RG -l $LOCATION
    az aks create -n $AKS -g $RG --node-count $NODE_COUNT
    az acr create  -n $ACR -g $RG --sku $SKU
    az aks get-credentials -n $AKS -g $RG
    kubectl get nodes -o wide
    kubectl get pods,daemonset,svc,secret --all-namespaces
}
delete() {
    az aks delete -n $AKS -g $RG 
    az acr delete  -n $ACR -g $RG
    az group delete -n $RG
}

actionchooser

echo "👌"


Several remarks:

  • This can also be done with Azure Powershell or directly in Azure Portal;
  • Consider following azure naming convention when giving names to your resources;
  • ACR's SKU can be Basic in most case (except when enabling public pull access), for the rest it does not affect on the end result of this article;
  • --node-count is set to 1 in order to keep the cluster with just 1 node for demo purposes.

Adding image to ACR

I am going to authenticate with my private registry and then use nginx:latest as the image to store in the private ACR repository:



az acr login -n ACR_NAME
docker pull nginx
docker tag nginx ACR_NAME.azurecr.io/mynginx


We also verify if the image has arrived successfully to our ACR (we can do the same via Azure Portal by going to ACR > Repository).



az acr repository list -n ACR_NAME

[
  "mynginx"
]


Creating pods with the image from ACR

I am going to use az aks get-credentials to authenticate with my AKS and then, with kubectl, submit two pods with images from public docker reg and from private ACR:



az aks get-credentials -n AKS_NAME -g RESOURCE_GROUP
kubectl run mynginx1 --image=nginx --image-pull-policy=Always  --dry-run=client -o yaml  > mynginx1.yaml
kubectl run mynginx2 --image=ACR_NAME.azurecr.io/mynginx --image-pull-policy=Always  --dry-run=client -o yaml  > mynginx2.yaml
kubectl apply -f mynginx1.yaml
kubectl apply -f mynginx2.yaml
kubectl get pods 

NAME       READY   STATUS         RESTARTS   AGE
mynginx1   1/1     Running        0          14s
mynginx2   0/1     ErrImagePull   0          2s


With kubectl run I generated two yaml files. Let me dump them here: 1) docker's hub (pub) nginx; 2) private CR's image. I am also adding --image-pull-policy=Always in order to pull the image always from the registry, so we won't bump into cached image layers while doing our further experiments. Note that the default behaviour is to pull in case IfNotPresent (check for further details).

Pod with image from public registry (authentication is not required)



apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: mynginx1
  name: mynginx1
spec:
  containers:
  - image: nginx
    imagePullPolicy: Always
    name: mynginx1
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}


Pod with image from private registry (authentication is required)



apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: mynginx2
  name: mynginx2
spec:
  containers:
  - image: ACR_NAME.azurecr.io/mynginx
    imagePullPolicy: Always
    name: mynginx2
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}


As we can see the image that we use in pod mynginx2 fails with ErrImagePull. We can explore this error further with kubectl describe pod mynginx2 to see what exactly cause the issue:



kubectl describe pod mynginx2
ame:         mynginx2
Namespace:    default
Priority:     0
Node:         aks-nodepool1-36348178-vmss000000/10.240.0.4
Start Time:   Sun, 10 Oct 2021 22:04:51 +0200
Labels:       run=mynginx2
Annotations:  
Status:       Pending
IP:           10.244.0.10
IPs:
  IP:  10.244.0.10
Containers:
  mynginx2:
    Container ID:   
    Image:          ACR_NAME.azurecr.io/mynginx
    Image ID:       
    Port:           
    Host Port:      
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Environment:    
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-7zgfm (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-7zgfm:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-7zgfm
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  3m41s                 default-scheduler  Successfully assigned default/mynginx2 to aks-nodepool1-36348178-vmss000000
  Normal   Pulling    2m6s (x4 over 3m40s)  kubelet            Pulling image "ACR_NAME.azurecr.io/mynginx"
  Warning  Failed     2m6s (x4 over 3m40s)  kubelet            Failed to pull image "ACR_NAME.azurecr.io/mynginx": rpc error: code = Unknown desc = failed to pull and unpack image "ACR_NAME.azurecr.io/mynginx:latest": failed to resolve reference "ACR_NAME.azurecr.io/mynginx:latest": failed to authorize: failed to fetch anonymous token: unexpected status: 401 Unauthorized
  Warning  Failed     2m6s (x4 over 3m40s)  kubelet            Error: ErrImagePull
  Normal   BackOff    114s (x6 over 3m40s)  kubelet            Back-off pulling image "ACR_NAME.azurecr.io/mynginx"
  Warning  Failed     99s (x7 over 3m40s)   kubelet            Error: ImagePullBackOff



There you go Failed to pull image "ACR_NAME.azurecr.io/mynginx": rpc error: code = Unknown desc = failed to pull and unpack image "ACR_NAME.azurecr.io/mynginx:latest": failed to resolve reference "ACR_NAME.azurecr.io/mynginx:latest": failed to authorize: failed to fetch anonymous token: unexpected status: 401 Unauthorized and this simply means that having AKS and ACR in the same tenant and subscription is not enough to have images pulled from registry to the cluster. Let's fix the issue.

Option 1: enabling anonymous pull access

For development purposes I can add anonymous public pull access to ACR. Unfortunately it is only supported with Standard SKU, so I am going to upgrade my ACR and set it's SKU first.

Keep in mind, this makes your ACR image available for anyone in the World since it allows anonymous pulls! I'd say consider such configuration if you maintain public registry (like docker hub) or in case you stuck with troubleshooting and want exclude some silly mis-configuration issues.



az acr update -n ACR_NAME --sku Standard
az acr update -n ACR_NAME --anonymous-pull-enabled


Since pulls are anonymous now, the control plane will try to pull the image again for pod mynginx2 and it should be successful this time.

Let's disable anonymous pull access and set SKU back to basic. Let's also delete pod mynginx2 so we can repeat the experiment with other options.



az acr update -n ACR_NAME --anonymous-pull-enabled false
az acr update -n ACR_NAME --sku Basic
kubectl delete pod mynginx2
pod "mynginx2" deleted


Option 2: Azure AD Managed Identity

If we examine the cluster via Portal (probably the easiest way of doing this), we can see that Managed Identity has been also created for us (while creating AKS cluster). Managed identity simplifies the work of cloud resources, so we don't have to think about authentication and storing credentials, we can define who, to whom and how we can trust.

Managed Identity - AKS

That managed identity doesn't have any assignment yet, but we can define it with the following command and try to create a pod and see if the ACR image is pullable. Keep in mind the following command requires Owner, Azure account administrator, or Azure co-administrator role on the Azure subscription.



az aks update -n AKS_NAME -g RESOURCE_GROUP --attach-acr ACR_NAME
kubectl apply -f mynginx2.yaml


And also if we examine this Managed Identity a bit further by checkin Azure AD > Enterprise Application we can see that managed identity is a special type of service principal and by using above command we assigned AcrPull role to it and binded it to our ACR resource. Check via the portal by going to MC_RESOURCE_GROUP_AKS_NAME_westeurope > AKS_NAME-agentpool > Azure role assignments.

Managed Identity - role assignments

So kubectl get pods should give us successful pull.



NAME       READY   STATUS    RESTARTS   AGE
mynginx1   1/1     Running   0          66m
mynginx2   1/1     Running   0          47s


I am going to delete the image and detach the integration so we can explore some further options:



az aks update -n AKS_NAME -g RESOURCE_GROUP --detach-acr ACR_NAME
kubectl delete pod mynginx2


Option 3: Kubernetes Secret

We create Kubernetes Secret of type docker-registry and make a reference in our pod by using imagePullSecret property explicitly! Let's generate secret out of access keys.



ACR_NAME=ACR_NAME.azurecr.io
EMAIL=

az acr update -n $ACR_NAME --admin-enabled true

ACR_USERNAME=$(az acr credential show -n $ACR_NAME --query="username" -o tsv)
ACR_PASSWORD=$(az acr credential show -n $ACR_NAME --query="passwords[0].value" -o tsv)

kubectl create secret docker-registry mysecret --docker-server=$ACR_NAME --docker-username=$ACR_USERNAME --docker-password=$ACR_PASSWORD --docker-email=$EMAIL --dry-run=client -o yaml > mysecret.yaml


Now if we check mysecret.yaml we can see that it contains a base64 encoded string in data > .dockerconfigjson. There are some other options to create secrets out of other credentials. Check. I am going to generate a new version of pod manifest with mysecret reference in it. Then create this pod on my cluster.



kubectl run mynginx2 --image=ACR_NAME.azurecr.io/mynginx --image-pull-policy=Always --overrides='{ "spec": { "imagePullSecrets": [{"name": "mysecret"}] } }' --dry-run=client -o yaml > mynginx2.yaml
kubectl apply -f mynginx2.yaml
kubectl get pods
NAME       READY   STATUS    RESTARTS   AGE
mynginx1   1/1     Running   0          120m
mynginx2   1/1     Running   0          93s


Great, the secret works.

Don't use this option in production if working with AKS and ACR, consider use Managed Identity (option #2) and let Azure care about trust between the two services.

Okay, these were 3 different options to authenticate and pull images from ACR to AKS cluster. There are list of available authentication options that I encourage to review in order to learn more about service principal, AKS cluster service principal, various options with managed identity and repository-scoped access token.




Start discussion:
Related articles:
I've been playing with ACI ("serverless" containers in Azure) and were thinking about use case of this wonderful service. Ended up with jumpbox as one of the example ... more
about 3 years#jumpbox #docker #azure
Build and release docker compose application using Azure DevOps in an automated manner using multistage yaml pipeline. ... more
This article contains several recommendations for passing kubernetes exams (both CKAD and CKA). ... more