OpenAI-compatible AI control plane

Xeno

Turn every trusted laptop, workstation, server, and accelerator host into one shared model-serving network.

Xeno gives teams a single governed endpoint for distributed inference, with routing policy, client identity, telemetry, dashboards, and SDKs built around the hardware and models they already own.

One endpoint Chat, completions, embeddings, and model listing through OpenAI-compatible routes.
Fleet aware Clients advertise hardware, providers, models, health, and telemetry.
Operator ready Dashboards, request history, routing policies, tenants, credentials, and analytics.
Ollama vLLM OpenAI Gemini OpenAI-compatible SQLite PostgreSQL

What it does

Make AI capacity addressable wherever it already exists.

Xeno separates the public API surface from the machines doing the work. Applications call the Xeno server. Connected clients run beside local runtimes and accelerators, advertise what they can serve, execute routed requests, and report enough telemetry for operators to trust the fleet.

OpenAI-compatible API

Expose model listing, chat completions, completions, and embeddings behind one stable endpoint.

Authenticated clients

Tenant-bound clients connect over WebSockets and advertise machine inventory, providers, and models.

Policy-based routing

Use model and client match rules with RoundRobin, Random, FirstAvailable, LeastRecentlyUsed, or Adaptive selection.

Operational visibility

Inspect request history, headers, bodies, stream counters, routing decisions, host metrics, provider telemetry, and analytics.

How it works

A control plane in the middle. Workload runners at the edge.

The server owns identity, tenancy, routing policy, request history, telemetry, and the API. The client owns local execution against Ollama, vLLM, OpenAI, Gemini, or another OpenAI-compatible provider.

Xeno architecture showing OpenAI-compatible clients, the Xeno server control plane, and Xeno workload runner clients.
01

Connect nodes

Install the lightweight xeno client on machines with reachable model runtimes or accelerator capacity.

02

Advertise inventory

Each client reports provider endpoints, supported models, host health, GPU details, and provider telemetry.

03

Route requests

Applications call Xeno once. The server authenticates, checks permissions, applies policy, and selects an eligible client.

04

Observe outcomes

Operators inspect routing traces, selected clients, timing, token metadata, failures, and fleet utilization.

Deployment options

Run it where your control boundary lives.

Xeno is designed for public-cloud coordination, private accelerator estates, and hybrid fleets that combine local model runtimes with hosted APIs.

Cloud control plane

Run a small Xeno server in cloud infrastructure, issue tenant-scoped client credentials, and connect trusted contributor machines from anywhere.

On-premises federation

Deploy internally, install clients on GPU hosts, and give agentic applications one governed API endpoint for private capacity.

Hybrid and burst

Blend local Ollama or vLLM with OpenAI, Gemini, and compatible endpoints while keeping application code stable.

Dashboards

See the fleet, not just the request.

Xeno ships with server and local client dashboards for live capacity, connected clients, advertised models, telemetry, routing policy, API exploration, request history, and tenant administration.

Xeno dashboard screenshot
Home dashboard with live request volume, success rate, connected clients, advertised models, and request activity.

Developer experience

Keep the app integration boring. Make the fleet powerful.

Applications use Xeno like a normal OpenAI-compatible endpoint. Operators can change routing, providers, and model placement without forcing application teams to chase machine-specific details.

OpenAI-compatible request

curl http://localhost:9000/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "x-tenant-guid: ten_default" \
  -d '{
    "model": "llama3.2:latest",
    "messages": [
      { "role": "user", "content": "Explain an inverted index." }
    ],
    "stream": true
  }'

JavaScript SDK

import { XenoClient } from '@xeno-control-plane/sdk';

const client = new XenoClient({
  baseUrl: 'http://localhost:9000',
  bearerToken: token,
  tenantId: 'ten_default'
});

const models = await client.listOpenAiModels();
const policies = await client.listRoutingPolicies();

Python SDK

from xeno_client import XenoClient

client = XenoClient(
    "http://localhost:9000",
    bearer_token=token,
    tenant_id="ten_default",
)

models = client.list_openai_models()
policies = client.list_routing_policies()

C# SDK

using Xeno.Sdk;

using XenoClient client = new XenoClient(
    "http://localhost:9000",
    bearerToken: token,
    tenantId: "ten_default");

XenoApiResponse models =
    await client.ListOpenAiModelsAsync();
XenoApiResponse policies =
    await client.ListRoutingPoliciesAsync();

Use cases

From spare GPUs to governed accelerator fleets.

Xeno is for teams that want placement, policy, and observability around model execution without rewriting every application.

Community GPU networks

Let trusted contributors attach home labs, offices, classrooms, and partner machines to a shared endpoint.

Enterprise AI platforms

Give internal teams a single inference API backed by managed GPU hosts, role-based access, and audit-friendly request history.

Research clusters

Expose heterogeneous workstations as one pool while keeping model inventory and machine details visible to operators.

Private and edge inference

Keep execution near sensitive data or branch users while retaining central visibility and policy control.

Included surfaces

Everything needed to operate the alpha end to end.

Multitenant server REST API OpenAPI and Swagger Postman collection C# SDK JavaScript SDK Python SDK Client WebSocket protocol Request history Server telemetry Local client dashboard Docker deployment

Getting started

Clone it. Run Docker. Connect capacity.

Xeno is MIT licensed and available on GitHub. The default Docker deployment starts PostgreSQL, initializes the system, and brings up the server and dashboard for local evaluation.

Docker quick start

git clone https://github.com/jchristn/xeno.git
cd xeno/docker
docker compose pull
docker compose up -d

# Dashboard: http://localhost:9100
# API:       http://localhost:9000
# Swagger:   http://localhost:9000/swagger