Hey everyone! I was just skimming through some inference benchmarks of other people and noticed the driver version is usually mentioned. It made me wonder how relevant this is. My prod server runs Debian 12 so the packaged nvidia drivers are rather old, but I’d prefer not to mess with the drivers if it won’t bring a benefit. Does any of you have any experience or did do some testing?

  • keepthepace@tarte.nuage-libre.fr
    link
    fedilink
    Français
    arrow-up
    3
    ·
    18 days ago

    The CUDA version is what matters the most (assuming you are on NVidia). Later CUDA versions have optimizations that earlier don’t, this may in turn dictate the actual driver version you can use.

    I guess some models will simply deactivate some optimizations if you don’t have an appropriate version, though I mostly am aware of them failing in that case :-/

    If you compare a model running on CUDA 11 vs a model running on CUDA 12, people may point out that it could be unfair, though this is generally nitpicky.

    If you are worried about your perfs not being optimal, look in the log for messages like “<fast attention scheme XYZ> was deactivated because <cudaSuperOptimizedMegaSparseMatMult> was not available”

    • robber@lemmy.mlOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      18 days ago

      I see. When I run the inference engine containerized, will the container be able to run its own version of CUDA or use the host’s version?

      • keepthepace@tarte.nuage-libre.fr
        link
        fedilink
        Français
        arrow-up
        2
        ·
        17 days ago

        I am not sure, I have tried to avoid this whole situation in the last few years :-) IIRC it can have its own CUDA version, but double check that.