llama.cpp (Cortex)

Overview

Jan uses llama.cpp for running local AI models. You can find its settings in Settings () > Local Engine > llama.cpp:

llama.cpp

These settings are for advanced users, you would want to check these settings when:

Your AI models are running slowly or not working
You've installed new hardware (like a graphics card)
You want to tinker & test performance with different backends

Engine Version and Updates

Engine Version: View current version of llama.cpp engine
Check Updates: Verify if a newer version is available & install available updates when it's available

Available Backends

Jan offers different backend variants for llama.cpp based on your operating system, you can:

Download different backends as needed
Switch between backends for different hardware configurations
View currently installed backends in the list

⚠️

Choose the backend that matches your hardware. Using the wrong variant may cause performance issues or prevent models from loading.

macOS

mac-arm64: For Apple Silicon Macs (M1/M2/M3)
mac-amd64: For Intel-based Macs

Windows

win-cuda: For NVIDIA GPUs using CUDA
win-cpu: For CPU-only operation
win-directml: For DirectML acceleration (AMD/Intel GPUs)
win-opengl: For OpenGL acceleration

Linux

linux-cuda: For NVIDIA GPUs using CUDA
linux-cpu: For CPU-only operation
linux-rocm: For AMD GPUs using ROCm
linux-openvino: For Intel GPUs/NPUs using OpenVINO
linux-vulkan: For Vulkan acceleration

For detailed hardware compatibility, please visit our guide for Mac, Windows, and Linux.

Settings TensorRT-LLM