Home / BeaverDeck / Docs / Insights Guide / GPU Insights

GPU Insights

GPU discovery, allocation pressure, placement, quota, fragmentation, and expensive-capacity usage.

Permissions: viewing checks requires insights: view. Opening a linked object or logs requires the corresponding resource permission, and the BeaverDeck ServiceAccount must be allowed to read the Kubernetes resources used by the check. Suppressing a finding requires insights: edit and affects all users.

Data Evaluated

Node nvidia.com/gpu allocatable capacity, active pod GPU requests, scheduling state, selected namespaces, and ResourceQuotas.

Checks

CheckWhen it reportsAlert severity
GPU Capacity Discovery
gpu-capacity-discovery
Selected namespaces contain active GPU requests, but no Node advertises allocatable nvidia.com/gpu capacity. Critical
GPU Allocation Pressure
gpu-allocation-pressure
Active pod GPU requests reach at least 80% of allocatable GPUs on a GPU node or across the selected namespaces. The severity becomes critical at 95%. Warning at 80%; critical at 95%
GPU Node Scheduling
gpu-node-cordoned
A Node advertises GPU capacity and is marked unschedulable. Warning
GPU Idle Allocation
gpu-node-idle-allocation
A Node advertises GPU capacity, but no active pod in the selected namespaces requests a GPU on that node. Warning
GPU Node Workload Mix
non-gpu-pods-on-gpu-node
An active, non-DaemonSet Pod without a GPU request is scheduled on a GPU node. Warning
GPU Fragmentation
gpu-fragmentation
A GPU Pod has been Pending for at least 5 minutes, total free GPU capacity across schedulable GPU nodes is sufficient, but no single node has enough free GPUs for that Pod. Warning
Namespace GPU Usage
gpu-namespace-usage
A selected namespace has one or more active GPU workload requests. BeaverDeck reports the total requested GPU count and whether a GPU quota exists. Informational passing check
GPU Quota
gpu-quota
A selected namespace has active GPU requests but no ResourceQuota hard limit for requests.nvidia.com/gpu, limits.nvidia.com/gpu, or nvidia.com/gpu. Warning
GPU Pod Pending
gpu-pod-pending
An active Pod requests GPUs and remains Pending for at least 5 minutes. Warning
GPU Pod Readiness
gpu-pod-unready
A GPU-requesting Pod is assigned to a node but remains not Ready for at least 10 minutes from its Ready-condition transition or creation time. Warning
GPU Pod Requests
gpu-pod-requests
An active GPU-requesting Pod has an init or application container without a CPU request or memory request. Warning

Open an individual check for risk context, recommended response, and limitations. Passing checks are visible when Show all checks is enabled in BeaverDeck.