Ollama Open-Webui

Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

Notes only for now:

https://github.com/open-webui/open-webui
https://docs.openwebui.com/getting-started/env-configuration/#general

Nvidia GPU, linux:

mkdir /opt/ollama
mkdir /opt/open-webui
mkdir /opt/openedai-speech/tts-voices
mkdir /opt/openedai-speech/tts-config
mkdir /opt/pipelines
mkdir /opt/docker-ssl-proxy
mkdir /opt/faster-whisper-server

docker-ollama.yml

name: ollama
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    volumes:
      - /opt/ollama:/root/.ollama
    ports:
      - 11434:11434
    #runtime: nvidia
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            #device_ids: ['0']
            count: 1
            capabilities: [gpu]

docker-openwebui.yml

name: open-webui
services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            device_ids: ['0']
            capabilities: [gpu]
    ports:
      - 3000:8080
    volumes:
      - /opt/open-webui:/app/backend/data
    restart: unless-stopped
    extra_hosts:
      host.docker.internal: host-gateway
    environment:
      - WEBUI_NAME=CustomGPTName
      - TZ=Europe/London
      - RAG_EMBEDDING_MODEL_TRUST_REMOTE_CODE=True # allow sentencetransformers to execute code like for alibaba-nlp/gte-large-en-v1.5

docker-openedai-speech.yml

name: openedai-speech
services:
  openedai-speech:
    image: ghcr.io/matatonic/openedai-speech
    container_name: openedai-speech
    ports:
      - "8060:8000"
    volumes:
      - /opt/openedai-speech/tts-voices:/app/voices
      - /opt/openedai-speech/tts-config:/app/config
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            device_ids: ['0']
            capabilities: [gpu]
    environment:
      - TZ=Europe/London
      - TTS_HOME=voices
      - HF_HOME=voices
      # - PRELOAD_MODEL=xtts
      # - PRELOAD_MODEL=xtts_v2.0.2
      # - PRELOAD_MODEL=parler-tts/parler_tts_mini_v0.1

In open-webui under settings → audio

set to openai
API Base URL: http://host.docker.internal:8060/v1
API Key: sk-111111111 (note: this is a dummy API key, no key required)
Under TTS Voice within the same audio settings menu in the admin panel, you can set the TTS Model to use from the following choices below that openedai-speech supports. The voices of these models are optimized for the English language.
tts-1 or tts-1-hd: alloy, echo, echo-alt, fable, onyx, nova, and shimmer (tts-1-hd is configurable; uses OpenAI samples by default)

docker-pipelines.yml

name: pipelines
services:
  pipelines:
    image: ghcr.io/open-webui/pipelines:main
    container_name: pipelines
    volumes:
      - /opt/pipelines:/app/pipelines
    ports:
      - 9099:9099
    restart: always
    extra_hosts:
      host.docker.internal: host-gateway
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            device_ids: ['0']
            capabilities: [gpu]

https://zohaib.me/extending-openwebui-using-pipelines/

Under settings→connections set:

OPENAI API host: http://host.docker.internal:9099
OPENAI API Key: 0p3n-w3bu!

git clone https://github.com/fedirz/faster-whisper-server

docker-faster-whisper-server.yml

name: faster-whisper-server
services:
  faster-whisper-server-cuda:
    image: fedirz/faster-whisper-server:latest-cuda
    build:
      dockerfile: faster-whisper-server/Dockerfile.cuda
      context: ./faster-whisper-server
      platforms:
        - linux/amd64
    volumes:
      - /opt/faster-whisper-server/:/root/.cache/huggingface
    restart: unless-stopped
    ports:
      - 8010:8000
    develop:
      watch:
        - path: faster_whisper_server
          action: rebuild
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: ["gpu"]

go to settings → audio, set

OPENAI API host: http://host.docker.internal:8010/v1
OPENAI API Key: sk-something
model: whisper-1

NOTE: speech to text requires https connection to open-webui as browsers do not have access to microphone on http connection!

mkdir /opt/docker-ssl-proxy/
cd /opt/docker-ssl-proxy/
openssl req -subj '/CN=hostname.example.com' -x509 -newkey rsa:4096 -nodes -keyout key.pem -out cert.pem -days 365

/opt/docker-ssl-proxy/proxy_ssl.conf

server {
  listen 80;
  server_name _;
  return 301 https://$host$request_uri;
}
server {
  listen 443 ssl;
  ssl_certificate /etc/nginx/conf.d/cert.pem;
  ssl_certificate_key /etc/nginx/conf.d/key.pem;
  location / {
     proxy_pass http://host.docker.internal:3000;
  }
}

docker-ssl-proxy.yml

name: nginx-proxy
services:
  nginx-proxy:
    image: nginx
    container_name: nginx-proxy
    ports:
      - 80:80
      - 443:443
    volumes:
      - /opt/docker-ssl-proxy:/etc/nginx/conf.d
    restart: unless-stopped
    extra_hosts:
      host.docker.internal: host-gateway
    environment:
      - TZ=Europe/London

To pull an ollama image, better to use ollama directly as the webinterface doesn't handle stalls well:

docker exec -ti ollama ollama pull imagename:tag

To update all previously pulled ollama models, use this bash script:

update-ollama-models.sh

#!/bin/bash
 
docker exec -ti ollama ollama list | tail -n +2 | awk '{print $1}' | while read -r model; do
  echo "Updating model: $model..."
  docker exec -t ollama ollama pull $model
  echo "--"
done
echo "All models updated."

AMD GPU on Windows:

docker-ollama.yml

name: ollama
services:
  ollama:
    image: ollama/ollama:rocm
    container_name: ollama
    volumes:      
      - /p/Docker_Volumes/ollama:/root/.ollama
    ports:        
      - 11434:11434
    deploy:
      resources:        
        reservations:
          devices:          
            - capabilities: [gpu]

docker-openwebui.yml

      
name: webui
services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: [gpu]
    ports:
      - 3000:8080
    volumes:
      - /p/Docker_Volumes/openwebui:/app/backend/data
    restart: always
    extra_hosts:
      host.docker.internal: host-gateway
    environment:
      - WEBUI_AUTH=false

Create the respective docker volumes folder:

# p/Docker_Volumes = P:\Docker_Volumes
mkdir P:\Docker_Volumes

# docker install - choose the WSL2 backend
# cmd line

docker compose -f docker-openwebui.yml up -d
docker compose -f docker-ollama.yml up -d

to update all ollama models on windows, use this powershell command - adjust for the hostname/ip ollama is running on:

(Invoke-RestMethod http://localhost:11434/api/tags).Models.Name.ForEach{ ollama pull $_ }
 
#or if in docker
(Invoke-RestMethod http://localhost:11434/api/tags).Models.Name.ForEach{ docker exex -t ollama ollama pull $_ }

Curl OpenAI API test

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "llama3",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'
{"id":"chatcmpl-957","object":"chat.completion","created":1722601457,"model":"llama3","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":"Hi there! It's great to meet you! I'm here to help with any questions or tasks you might have. What brings you to this virtual space today? Are you looking for recommendations, seeking answers to a specific question, or maybe looking for some inspiration? Let me know, and I'll do my best to assist you."},"finish_reason":"stop"}],"usage":{"prompt_tokens":23,"completion_tokens":68,"total_tokens":91}}