Ollama + OpenWebUI on Railway
Self-hosted ChatGPT alternative in 10 minutes
The Problem
ChatGPT Plus costs $20/user/month, sends your data to OpenAI, and locks you into one model. Enterprise teams need a self-hosted alternative that runs Llama 3.3 and other OSS models with proper auth, multi-user support, and data sovereignty — but production deployment guides are thin.
The Solution
One-click deploy a containerized Ollama + OpenWebUI stack on Railway with persistent volume for models, PostgreSQL for users/auth, and optional Cloudflare tunnel for SSL. Add GPU via RunPod serverless for heavy workloads. Zero vendor lock-in, zero per-token cost.
Overview
Deploy a fully self-hosted ChatGPT alternative with Ollama (LLM runtime) + OpenWebUI (polished chat interface) on Railway. Get a private, auth-protected AI chat for your team with zero per-token costs. Supports Llama 3.3, Mistral, Qwen, and any GGUF model. Includes GPU support via RunPod for production workloads.
Architecture
Components
OpenWebUI Chat Interface
gatewayReact/Svelte frontend with multi-user auth, conversation history, model switching, RAG support, and prompt library.
Service: Railway (Docker)
Ollama Runtime
ai-serviceLLM inference server supporting Llama 3.3, Mistral, Qwen, Phi, Gemma, and any GGUF model. Exposes OpenAI-compatible API.
Service: Railway (Docker)
PostgreSQL
databaseUser accounts, conversation history, settings, and RBAC. Managed Postgres with automated backups.
Service: Railway Postgres
Model Volume
storagePersistent disk for downloaded GGUF models (10-80GB). Survives container restarts and redeploys.
Service: Railway Volume
Cloudflare Tunnel
gatewayOptional SSL + custom domain with zero config. Zero-trust access and DDoS protection.
Service: Cloudflare
RunPod Serverless GPU
externalOptional: offload heavy models (70B+) to serverless GPU. Pay per second, auto-scale to zero.
Service: RunPod Serverless
Automated Backups
storageDaily snapshots of conversations, users, and settings. S3-compatible storage with retention policy.
Service: Railway Scheduled Jobs
Implementation Steps
Core Deploy
15 minutes
One-click deploy the core stack and pull your first model
Tasks
- Click Railway template for Ollama + OpenWebUI
- Configure persistent volume for model storage (50GB)
- Set OLLAMA_BASE_URL and WEBUI_SECRET_KEY env vars
- Pull first model via OpenWebUI (llama3.3:8b or mistral:7b)
- Verify chat works with default admin user
Deliverables
Authentication & Multi-User
30 minutes
Configure auth, add team members, and set model permissions
Tasks
- Enable ENABLE_SIGNUP=false to lock down registration
- Create admin and regular user accounts
- Configure model-level permissions (admin-only vs all users)
- Add conversation sharing and workspace features
- Connect PostgreSQL for persistent user state
Deliverables
Production Hardening
1 hour
Add custom domain, backups, monitoring, and optional GPU overflow
Tasks
- Configure Cloudflare tunnel for custom domain + SSL
- Set up daily PostgreSQL backups to S3
- Add Railway observability (logs, metrics, alerts)
- Optional: configure RunPod serverless GPU for 70B models
- Load test with concurrent users and verify performance
Deliverables
Code Examples
Railway Deployment Config
railway.json and docker-compose.yml for Ollama + OpenWebUI stack
# docker-compose.yml
version: '3.9'
services:
ollama:
image: ollama/ollama:latest
volumes:
- ollama-models:/root/.ollama
ports:
- '11434:11434'
environment:
- OLLAMA_KEEP_ALIVE=24h
- OLLAMA_HOST=0.0.0.0
openwebui:
image: ghcr.io/open-webui/open-webui:main
depends_on:
- ollama
ports:
- '3000:8080'
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- WEBUI_SECRET_KEY=${WEBUI_SECRET_KEY}
- ENABLE_SIGNUP=false
- DATABASE_URL=${DATABASE_URL}
volumes:
- openwebui-data:/app/backend/data
volumes:
ollama-models:
openwebui-data:
# railway.json
{
"$schema": "https://railway.app/railway.schema.json",
"build": { "builder": "DOCKERFILE" },
"deploy": {
"restartPolicyType": "ON_FAILURE",
"restartPolicyMaxRetries": 10,
"healthcheckPath": "/health"
}
}Pull Model Bootstrap Script
Shell script to pre-pull models on first deploy
#!/bin/bash
# bootstrap-models.sh — run on first deploy
set -e
MODELS=(
'llama3.3:8b'
'mistral:7b'
'qwen2.5-coder:7b'
)
echo 'Waiting for Ollama to be ready...'
until curl -sf http://ollama:11434/api/tags > /dev/null; do
sleep 2
done
for model in "${MODELS[@]}"; do
echo "Pulling $model..."
curl -X POST http://ollama:11434/api/pull \
-H 'Content-Type: application/json' \
-d "{\"name\": \"$model\"}"
done
echo 'All models pulled successfully'Cost Estimate
$80
per month
$960
per year
Assumptions: Small team (5-10 users), 7B-8B models on CPU, Occasional GPU overflow for 70B models, ~500 chats/day
Use Cases
Technologies
Ready to Build?
Deploy this architecture in minutes, or get the production-ready template with full source code.