Shield On-premises Kubernetes Deployment Guide
Pre-requisites
Site Reliability Engineers (SREs)
To ensure the seamless deployment and ongoing management of a highly available Shield instance in a production
environment, it is imperative to have a dedicated team of skilled SREs. They should possess in-depth expertise in
your on-premises infrastructure and be equipped with the necessary permissions to execute their tasks effectively.
Arthur Platform License
- The Arthur team will provide a license for the products and features you have
Azure OpenAI
- Azure OpenAI service with at least one GPT-3.5 Turbo model endpoint
- A secure network route between AWS infrastructure and Azure OpenAI with OpenAI endpoint credentials made available
- Token limits, configured appropriately for your use cases
DNS URLs with SSL certs
Certificates signed by a well known trusted authority:
- For Shield (e.g. https://shield.mycompany.com)
- For Arthur Auth service (e.g. https://shield-auth.mycompany.com) - not required for API-only install
NOTE: The SSL certificates may NOT be self-signed.
AWS Elastic Kubernetes Service (EKS) Cluster
The chart is tested on Kubernetes version 1.31.
- CPU node group (m8g.large x 2)
- Memory: 16 GiB
- CPU: 4 cores
- GPU node group (g4dn.2xlarge x 2)
- Memory: 64 GiB
- CPU: 16 cores
- GPU: 2 cores
- Nginx ingress controller
- Metrics server
- A dedicated namespace (e.g.
shield
) - A
kubectl
workstation with admin privileges
Postgres database
Using a managed Postgres database, AWS RDS/Aurora is recommended.
- The latest available versions on AWS
arthur_shield
andarthur_auth
databases (the names are configurable)pgvector
, an open-source vector similarity search capability extension on PostgreSQL. The extension must be
available and the Shield application needs the credentials with permission to run
CREATE EXTENSION IF NOT EXISTS vector
.
Usage tracker access
The below AWS IAM policy must be attached to your Shield's Kubernetes service account's IAM role for Arthur to track the
token usage on your Shield deployment. This allows your Shield instances to send the processed token counts to Arthur's
AWS SQS queue. The only data sent to the SQS queue is the number of tokens. We do not collect any raw data.
If the tracker must be disabled, set shieldUsageTrackerEnabled
in the values.yaml
file to disabled
.
{
"Statement": [
{
"Action": "sqs:sendmessage",
"Effect": "Allow",
"Resource": "arn:aws:sqs:us-east-2:451018380405:arthur-bi-queue"
}
],
"Version": "2012-10-17"
}
Arthur Container Registry Access
- Arthur will provide credentials for pulling Docker containers from our public Container Repository
- There must be a network route available to connect to our Arthur Repository via
repository.arthur.ai
and
docker.arthur.ai
Private Container Registry (Optional but highly recommended)
Arthur suggests hosting the Shield container images on your private container registry, like AWS ECR, for enhanced
speed and reliability during each deployment and scaling-out process. Below is an example of how you can pull them.
docker login docker.arthur.ai -u <username>
docker pull docker.arthur.ai/arthur/auth:1.0.42
docker pull docker.arthur.ai/arthur/shield:<shield_version_number>
How to configure your AWS EKS cluster with a GPU node group for Shield
This section is a guide to help you configure your existing AWS EKS cluster with a GPU node group for Shield.
To perform the steps, you need AWS CLI with admin level permissions for the target AWS account.
- Prepare base64 encoded user data for boostrapping the EKS GPU nodes with the below script.
Replace${CLUSTER_NAME}
with your EKS cluster name. The script is tested on MacOS.
export USER_DATA_BASE64=$(cat <<'EOF' | base64 -b 0
#!/bin/bash
set -ex
# EKS Bootstrap script
/etc/eks/bootstrap.sh ${CLUSTER_NAME}
# CloudWatch Agent setup
yum install -y amazon-cloudwatch-agent
cat <<CWAGENT > /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
{
"agent": {
"run_as_user": "root"
},
"metrics": {
"aggregation_dimensions": [["InstanceId"]],
"metrics_collected": {
"nvidia_gpu": {
"append_dimensions": {
"EKSClusterName": "${CLUSTER_NAME}",
"EKSNodeGroupType": "arthur-shield-eks-gpu",
"ImageId": "$(curl -s http://169.254.169.254/latest/meta-data/ami-id)",
"InstanceId": "$(curl -s http://169.254.169.254/latest/meta-data/instance-id)",
"InstanceType": "$(curl -s http://169.254.169.254/latest/meta-data/instance-type)"
},
"measurement": [
"utilization_gpu",
"utilization_memory",
"memory_total",
"memory_used",
"memory_free",
"power_draw"
]
}
}
}
}
CWAGENT
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
-a fetch-config \
-m ec2 \
-c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json \
-s
systemctl enable amazon-cloudwatch-agent
systemctl restart amazon-cloudwatch-agent
EOF
)
-
Make sure your AWS CLI is configured with the correct AWS account and region
-
Add the following permissions to your EKS node IAM role so that the GPU metrics can be shipped to CloudWatch from the GPU nodes
"Action": [
"cloudwatch:ListMetrics",
"cloudwatch:PutMetricData",
"cloudwatch:PutMetricStream"
],
- Look up the AMI ID for the latest GPU optimized AMI
aws ssm get-parameters \
--names /aws/service/eks/optimized-ami/<kubernetes-version>/amazon-linux-2-gpu/recommended/image_id \
--region us-east-2
- Create a launch template for the GPU nodes.
ReplaceREPLACE_ME_CLUSTER_NAME
with your EKS cluster name.
ReplaceREPLACE_ME_AMI_ID
with the AMI ID you found in the previous step.
Make sure the$USER_DATA_BASE64
is correctly set from step 1.
export CLUSTER_NAME=REPLACE_ME_CLUSTER_NAME
export IMAGE_ID=REPLACE_ME_AMI_ID
export LAUNCH_TEMPLATE_NAME=arthur-shield-eks-gpu
export NODEGROUP_NAME=arthur-shield-eks-gpu
export INSTANCE_TYPE=g4dn.2xlarge
export VOLUME_SIZE=60
aws ec2 create-launch-template \
--launch-template-name ${LAUNCH_TEMPLATE_NAME} \
--version-description "Arthur Shield EKS GPU nodes" \
--launch-template-data "{
\"ImageId\": \"${IMAGE_ID}\",
\"InstanceType\": \"${INSTANCE_TYPE}\",
\"BlockDeviceMappings\": [
{
\"DeviceName\": \"/dev/xvda\",
\"Ebs\": {
\"VolumeSize\": ${VOLUME_SIZE},
\"Encrypted\": true
}
}
],
\"TagSpecifications\": [
{
\"ResourceType\": \"instance\",
\"Tags\": [
{
\"Key\": \"Name\",
\"Value\": \"${NODEGROUP_NAME}\"
},
{
\"Key\": \"kubernetes.io/cluster/${CLUSTER_NAME}\",
\"Value\": \"owned\"
}
]
}
],
\"UserData\": \"${USER_DATA_BASE64}\"
}"
- Create a EKS node group with the launch template created in the previous step.
ReplaceREPLACE_ME_SUBNET_1_ID
,REPLACE_ME_SUBNET_2_ID
,REPLACE_ME_SUBNET_3_ID
, andREPLACE_ME_NODE_ROLE_ARN
with the correct values.
export MIN_NODES=2
export MAX_NODES=2
export DESIRED_NODES=2
export SUBNET_1_ID=REPLACE_ME_SUBNET_1_ID
export SUBNET_2_ID=REPLACE_ME_SUBNET_2_ID
export SUBNET_3_ID=REPLACE_ME_SUBNET_3_ID
export NODE_ROLE_ARN=REPLACE_ME_NODE_ROLE_ARN
export LAUNCH_TEMPLATE_VERSION=1
LAUNCH_TEMPLATE_ID=$(aws ec2 describe-launch-templates \
--filters Name=launch-template-name,Values=${LAUNCH_TEMPLATE_NAME} \
--query 'LaunchTemplates[0].LaunchTemplateId' \
--output text)
aws eks create-nodegroup \
--cluster-name ${CLUSTER_NAME} \
--nodegroup-name ${NODEGROUP_NAME} \
--scaling-config minSize=${MIN_NODES},maxSize=${MAX_NODES},desiredSize=${DESIRED_NODES} \
--subnets ${SUBNET_1_ID} ${SUBNET_2_ID} ${SUBNET_3_ID} \
--launch-template id=${LAUNCH_TEMPLATE_ID},version=${LAUNCH_TEMPLATE_VERSION} \
--node-role ${NODE_ROLE_ARN} \
--labels capability=gpu \
--tags "k8s.io/cluster-autoscaler/enabled=true,k8s.io/cluster-autoscaler/${CLUSTER_NAME}=owned"
- Configure autoscaling policies for the node group. Wait until the node group is created before running the below commands.
export AUTOSCALING_GROUP_NAME=$(aws eks describe-nodegroup \
--cluster-name ${CLUSTER_NAME} \
--nodegroup-name ${NODEGROUP_NAME} \
--query 'nodegroup.resources.autoScalingGroups[0].name' \
--output text)
AUTOSCALING_ARN=$(aws autoscaling describe-auto-scaling-groups \
--auto-scaling-group-names ${AUTOSCALING_GROUP_NAME} \
--query 'AutoScalingGroups[0].AutoScalingGroupARN' \
--output text)
# Define queries for CloudWatch alarms
export CPU_UTILIZATION_QUERY="SELECT AVG(CPUUtilization) FROM SCHEMA(\"AWS/EC2\", AutoScalingGroupName) WHERE AutoScalingGroupName = '${AUTOSCALING_GROUP_NAME}'"
export CPU_ALARM_NAME="arthur-shield-eks-cpu-utilization-alarm"
export GPU_UTILIZATION_QUERY="SELECT AVG(nvidia_smi_utilization_gpu) FROM SCHEMA(CWAgent,EKSClusterName,ImageId,InstanceId,InstanceType,EKSNodeGroupType,arch,host,index,name) WHERE EKSClusterName = '${CLUSTER_NAME}' AND EKSNodeGroupType = 'arthur-shield-eks-gpu'"
export GPU_ALARM_NAME="arthur-shield-eks-gpu-utilization-alarm"
# Create scale-out policy for GPU
export SCALE_OUT_POLICY_ARN=$(aws autoscaling put-scaling-policy \
--auto-scaling-group-name ${AUTOSCALING_GROUP_NAME} \
--policy-name gpu-utilization-scale-out-policy \
--policy-type StepScaling \
--adjustment-type ChangeInCapacity \
--step-adjustments '[{
"MetricIntervalLowerBound": 0,
"ScalingAdjustment": 1
}]' \
--query 'PolicyARN' \
--output text)
aws cloudwatch put-metric-alarm \
--alarm-name ${GPU_ALARM_NAME}-scale-out \
--alarm-description "Triggers autoscaling when average GPU utilization exceeds 40%" \
--metrics '[{
"Id": "gpu_util",
"Expression": "'${GPU_UTILIZATION_QUERY}'",
"Period": 120,
"ReturnData": true
}]' \
--threshold 40 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 1 \
--alarm-actions ${SCALE_OUT_POLICY_ARN}
# Create scale-out policy for CPU
export CPU_SCALE_OUT_POLICY_ARN=$(aws autoscaling put-scaling-policy \
--auto-scaling-group-name ${AUTOSCALING_GROUP_NAME} \
--policy-name cpu-utilization-scale-out-policy \
--policy-type StepScaling \
--adjustment-type ChangeInCapacity \
--step-adjustments '[{
"MetricIntervalLowerBound": 0,
"ScalingAdjustment": 1
}]' \
--query 'PolicyARN' \
--output text)
aws cloudwatch put-metric-alarm \
--alarm-name ${CPU_ALARM_NAME}-scale-out \
--alarm-description "Triggers autoscaling when average CPU utilization exceeds 60%" \
--metrics '[{
"Id": "cpu_util",
"Expression": "'${CPU_UTILIZATION_QUERY}'",
"Period": 120,
"ReturnData": true
}]' \
--threshold 60 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 1 \
--alarm-actions ${CPU_SCALE_OUT_POLICY_ARN}
Note: For faster scaling, usage of warm instances can be considered.
- Optionally, create a scale-in policy for the GPU node group
# Create scale-in policy for GPU
export SCALE_IN_POLICY_ARN=$(aws autoscaling put-scaling-policy \
--auto-scaling-group-name ${AUTOSCALING_GROUP_NAME} \
--policy-name gpu-utilization-scale-in-policy \
--policy-type StepScaling \
--adjustment-type ChangeInCapacity \
--step-adjustments '[{
"MetricIntervalUpperBound": 0,
"ScalingAdjustment": -1
}]' \
--query 'PolicyARN' \
--output text)
aws cloudwatch put-metric-alarm \
--alarm-name ${GPU_ALARM_NAME}-scale-in \
--alarm-description "Triggers autoscaling when average GPU utilization is below 10%" \
--metrics '[{
"Id": "gpu_util",
"Expression": "'${GPU_UTILIZATION_QUERY}'",
"Period": 120,
"ReturnData": true
}]' \
--threshold 10 \
--comparison-operator LessThanThreshold \
--evaluation-periods 1 \
--alarm-actions ${SCALE_IN_POLICY_ARN}
# Create scale-in policies for CPU
export CPU_SCALE_IN_POLICY_ARN=$(aws autoscaling put-scaling-policy \
--auto-scaling-group-name ${AUTOSCALING_GROUP_NAME} \
--policy-name cpu-utilization-scale-in-policy \
--policy-type StepScaling \
--adjustment-type ChangeInCapacity \
--step-adjustments '[{
"MetricIntervalUpperBound": 0,
"ScalingAdjustment": -1
}]' \
--query 'PolicyARN' \
--output text)
aws cloudwatch put-metric-alarm \
--alarm-name ${CPU_ALARM_NAME}-scale-in \
--alarm-description "Triggers autoscaling when average CPU utilization is below 5%" \
--metrics '[{
"Id": "cpu_util",
"Expression": "'${CPU_UTILIZATION_QUERY}'",
"Period": 120,
"ReturnData": true
}]' \
--threshold 5 \
--comparison-operator LessThanThreshold \
--evaluation-periods 1 \
--alarm-actions ${CPU_SCALE_IN_POLICY_ARN}
- Label the CPU node group with
capability=cpu
How to install Shield using Helm Chart
-
Create Kubernetes secrets
# WARNING: Do NOT set up secrets this way in production. # Use a secure method such as sealed secrets and external secret store providers. kubectl -n shield create secret generic auth-admin-console-secret \ --from-literal=password='<password>' kubectl -n shield create secret generic postgres-secret \ --from-literal=username='<username>' \ --from-literal=password='<password>' kubectl -n shield create secret docker-registry arthur-repository-credentials \ --docker-server='docker.arthur.ai' \ --docker-username='<username>' \ --docker-password='<password>' \ --docker-email='' kubectl -n shield create secret generic shield-secret-api-key \ --from-literal=key='<api_key>' # Connection strings for Azure OpenAI GPT model endpoints (Many may be specified) # Must be in the form: # "DEPLOYMENT_NAME1::OPENAI_ENDPOINT1::SECRET_KEY1,DEPLOYMENT_NAME2::OPENAI_ENDPOINT2::SECRET_KEY2" kubectl -n shield create secret generic shield-secret-open-ai-gpt-model-names-endpoints-keys \ --from-literal=keys='<your_gpt_keys>' # Set these to any random UUID kubectl -n shield create secret generic shield-app-secret-key \ --from-literal=key='<app_key>' kubectl -n shield create secret generic shield-auth-client-secret \ --from-literal=secret='<auth_client_secret>'
-
Prepare Arthur Shield Helm Chart configuration file,
values.yaml
file by downloading the file below to a directory
where you will runhelm
install and populate the values accordingly:https://arthur-helm.s3.us-east-2.amazonaws.com/charts/<shield_version_number>/values.yaml
-
Install the Arthur Shield Helm Chart
# Arthur will provide credentials to the private Helm repository ARTHUR_REPOSITORY_USERNAME=<ARTHUR_REPOSITORY_USERNAME> ARTHUR_REPOSITORY_PASSWORD=<ARTHUR_REPOSITORY_PASSWORD> helm repo add arthur-shield https://repository.arthur.ai/repository/charts-stable \ --username ${ARTHUR_REPOSITORY_USERNAME} --password ${ARTHUR_REPOSITORY_PASSWORD} helm repo update helm upgrade --install -n shield -f values.yaml arthur-shield arthur-shield/arthur-shield --version <shield_version_number>
-
Configure DNS
- Create an
A
record that routes the Arthur Auth service ingress DNS URL to the Auth load balancer created by
the ingress. The Shield service pod will not start until this step is complete. - Create an
A
record that routes the Arthur Shield service ingress DNS URL to the Shield load balancer created
by the ingress.
- Create an
-
Verify that all the pods are running with
kubectl get pods -n shield
You should see both the auth and shield pods in the running state. Please also inspect the logs.
How to get started with using Shield
API
To get started with Shield API endpoints, open your browser and go to the interactive API documentation via your
Shield DNS URL (e.g. https://shield.mycompany.com/docs
). Authenticate with your admin key to create your first API
key according to these instructions.
UI
The UI can be accessed via the root path of your Shield DNS URL (e.g. https://shield.mycompany.com
). To access the
Shield Admin UI, a Shield admin user must be created with the right role according to
the instructions here.
FAQs
The usage of my Azure OpenAI endpoint is going beyond my quota. What do I do?
Azure OpenAI has a quota called Tokens-per-Minute (TPM). It limits the number of tokens that a single model can
process within a minute in the region the model is deployed. In order to get a larger quota for Shield, you can deploy
additional models in other regions and have Arthur Shield round-robin against multiple Azure OpenAI endpoints. In
addition, you can request and get approved for a model quota increase in the desired regions by Azure.
Updated about 1 month ago