How to Run LLM Model Locally

February 6, 2025 1 minute read

In this guide, I will walk through the steps to run large language models (LLMs) on your local machine using Ollama and Open WebUI.

Setting Up Ollama

Install Ollama: Ollama (Open Source Large Language Model Meta AI) is an open source platform that allow you to run LLMs on a local machine. You can download and install it from their website
Choose a Model: Ollama offers various LLM model that you can choose from. I choose phi4 model.
Run the Model: Open commend prompt and run:
```
ollama run phi4
```
That’s all!

Setting Up Open WebUI

Open WebUI is an extensible, feature-rich AI platform that operates offline with an OpenAI-compatible interface. It’s recommended to use Docker for running Open WebUI.

Install Docker : If you haven’t already, download and install Docker.

Run Open-WebUI : In my case, I will run Open-WebUI with Nvidia GPU, so use this commend:

docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:cuda

Alternatively, you can install Open WebUI bundled with Ollama with GPU support:

docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:ollama

Access Open WebUI : Navigate to localhost:3000 in your web browser.

Nvidia CUDA Toolkit

To monitor GPU usage, use the Nvidia command-line utility:

nvidia-smi -l 1

This will display information like GPU utilization and power usage at specified intervals (e.g., every 1 second). If your LLM isn’t utilizing the GPU, consider installing the Nvidia CUDA toolkit .

Integration with Continue VSCode Plugin

You can integrate a local LLM into the Continue VSCode Plugin as an AI code assistant. Detailed instructions are available in the Open WebUI documentation . Here is an example of config.json

  "models": [
    {
      "title": "qwen:14b",
      "provider": "openai",
      "model": "qwen2.5-coder:14b",
      "apiBase": "http://localhost:3000/ollama/v1",
      "apiKey": "sk-YOUR-API-KEY"
    }
  ],

The API key can be found in Open WebUI under Settings > Account.

Share on

Twitter Facebook LinkedIn

Joon Hee Jang

How to Run LLM Model Locally

Setting Up Ollama

Setting Up Open WebUI

Nvidia CUDA Toolkit

Integration with Continue VSCode Plugin

Share on

You may also enjoy

Setting Up Pi-hole on Ubuntu with AT&T BGW320 Router

Seaborn Data Visualization

Standardization of Datasets

Core Types of Machine Learning