Microsoft.ML C#: GPU not found in K8s/Docker container

1 week ago 11
ARTICLE AD BOX

I have created a .NET app that uses Microsoft.ML.OnnxRuntime.Gpu for interference. Now I'm trying to integrate it with Azure Kubernetes.

We have made the setup with Tesla T4 GPU and we confirmed it's visible:

enter image description here

So we know that T4 is visible under ID = 0.

This is basically my code that works locally on my Windows machine:

MLContext _mlContext = new(); var estimator = _mlContext.Transforms.ApplyOnnxModel( modelFile: _modelFile, inputColumnNames: _inputColumnNames, outputColumnNames: _outputColumnNames, gpuDeviceId: gpuId );

but when deploying to ACR, K8s etc., we are getting an exception:

System.InvalidOperationException: GPU with ID 0 is not found.

Bunch of information that I think may help.

My local machine nvidia-smi log:

enter image description here

libraries used:

<PackageVersion Include="Microsoft.ML" Version="3.0.1" /> <PackageVersion Include="Microsoft.ML.ImageAnalytics" Version="3.0.1" /> <PackageVersion Include="Microsoft.ML.OnnxRuntime" Version="1.20.1" /> <PackageVersion Include="Microsoft.ML.OnnxRuntime.Gpu" Version="1.20.1" /> <PackageVersion Include="Microsoft.ML.OnnxTransformer" Version="3.0.1" />

Image used in our Dockerfile:

nvidia/cuda:12.3.2-runtime-ubuntu22.04

My local nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2024 NVIDIA Corporation

Built on Wed_Oct_30_01:18:48_Pacific_Daylight_Time_2024

Cuda compilation tools, release 12.6, V12.6.85

Build cuda_12.6.r12.6/compiler.35059454_0

Read Entire Article