Workers AI Chat API

Streaming chat and code generation powered by Cloudflare Workers AI. OpenAI-compatible for use in Cursor, Continue.dev, and other AI IDEs.

Overview

This API exposes two backends with the same contract:

Cloudflare Worker – Uses the env.AI binding (no API token needed).
Node.js server – Calls the Workers AI HTTP API with your Cloudflare credentials.

Both support streaming SSE, model selection, and an OpenAI-compatible endpoint (/v1/chat/completions) for code generation in IDEs.

Base URL

https://api-ai.masiting.dev

Quick Start

# List available models
curl "https://api-ai.masiting.dev/api/models"

# Chat (streaming)
curl -N "https://api-ai.masiting.dev/api/chat" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"Say hello","stream":true}'

# OpenAI-compatible code generation
curl -N "https://api-ai.masiting.dev/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"Write a hello world in Python"}],"stream":true}'

POST /api/chat

Custom chat endpoint with prompt or messages. Returns SSE with {"delta":"..."} chunks and {"done":true} at the end.

POST/api/chat

Request body

{
  "prompt": "Explain recursion in one sentence.",
  "messages": [{"role": "user", "content": "..."}],
  "model": "@cf/meta/llama-3.1-8b-instruct",
  "stream": true,
  "max_tokens": 512,
  "temperature": 0.7
}

Use either prompt or messages. Set stream: false for a single JSON response.

GET /api/models

Returns supported models and the default model ID.

GET/api/models

{
  "defaultModel": "@cf/meta/llama-3.1-8b-instruct",
  "models": [
    {"id": "@cf/meta/llama-3.1-8b-instruct", "label": "Llama 3.1 8B Instruct", "description": "..."},
    {"id": "@hf/thebloke/deepseek-coder-6.7b-instruct-awq", "label": "DeepSeek Coder 6.7B Instruct", "description": "Code generation and editing."}
  ]
}

POST /v1/chat/completions

OpenAI Chat Completions compatible. Use this endpoint for Cursor, Continue.dev, and other OpenAI-compatible tools. Defaults to the code model when model is omitted.

POST/v1/chat/completions

Request body

{
  "model": "@hf/thebloke/deepseek-coder-6.7b-instruct-awq",
  "messages": [
    {"role": "system", "content": "You are a code assistant."},
    {"role": "user", "content": "Write a TypeScript function that adds two numbers."}
  ],
  "stream": true,
  "temperature": 0.4,
  "max_tokens": 512
}

Streaming response

SSE events with object: "chat.completion.chunk". Each chunk has choices[0].delta.content. Final chunk has finish_reason: "stop", followed by data: [DONE].

GET /health

GET/health

Returns {"ok": true}.

Integrate with Cursor

Open Cursor and click the Settings icon (gear) in the bottom-left.
Go to Models in the settings menu.
Scroll to OpenAI API Keys.
Enter your API key (if you add auth to this API) or a placeholder like not-needed if your endpoint has no auth.
Enable Override OpenAI Base URL and set it to:
```
https://api-ai.masiting.dev
```
Do not include /v1 or /v1/chat/completions – Cursor appends the path automatically.
Choose a model from the dropdown or enter a model ID manually, e.g. @hf/thebloke/deepseek-coder-6.7b-instruct-awq.
Save and use Cursor as usual. Composer and chat will route through your Workers AI endpoint.

Note: Tab completion and some features may behave differently with custom endpoints. If you encounter issues, try the default models first.

Integrate with Continue.dev

Install the Continue extension in VS Code.
Open the Continue config file: ~/.continue/config.json (or config.yaml).

Add an OpenAI-compatible model with your base URL:

{
  "models": [
    {
      "title": "Workers AI Code",
      "provider": "openai",
      "model": "@hf/thebloke/deepseek-coder-6.7b-instruct-awq",
      "apiBase": "https://api-ai.masiting.dev/v1",
      "apiKey": "not-needed"
    }
  ]
}

Select Local Config in Continue and choose your new model from the dropdown.

Claude Code

Claude Code (and the Claude VS Code extension) uses Anthropic's API format, not OpenAI's. This API is OpenAI Chat Completions compatible, so it does not work directly with Claude Code's ANTHROPIC_BASE_URL.

Options:

Use Cursor or Continue.dev with this API instead – both support OpenAI-compatible endpoints.
Use a proxy (e.g. OpenRouter, LiteLLM) that can translate Anthropic requests to OpenAI format and route to this endpoint – advanced setup.

Integrate with Kilo Code (VS Code)

Kilo Code supports OpenAI-compatible providers. This API returns CORS headers and GET /v1/models so the extension can list models.

Install the Kilo Code extension from the VS Code marketplace.
Open Kilo Code settings (click the Kilo icon) and choose Use your own API key.
Select provider: OpenAI Compatible.
Set Base URL to: https://api-ai.masiting.dev/v1 (include /v1 so Kilo does not double-append paths).
Set API Key to any placeholder (e.g. not-needed) if you have not enabled auth on this API.
Set Model to one of: gpt-5.4, gpt-4, gpt-3.5-turbo, deepseek-coder, qwen-coder, or sqlcoder. If the model list loads, pick from the dropdown; otherwise type the model ID.
Save and run a task. Chat/completions will use POST /v1/chat/completions.

No models listed? Ensure Base URL is exactly https://api-ai.masiting.dev/v1. Kilo fetches GET /v1/models from that base; this API supports it and returns CORS headers for browser/webview requests.

Other IDEs & Tools

Any tool that supports an OpenAI-compatible API can use this endpoint. Typical configuration:

Setting	Value
Base URL	`https://api-ai.masiting.dev`
API path	`/v1/chat/completions` (often auto-appended)
Model	`@hf/thebloke/deepseek-coder-6.7b-instruct-awq` or any from `GET /api/models`
API key	Leave empty or use your key if you add auth

Examples of OpenAI-compatible tools: Open WebUI, LocalAI, LiteLLM, Ollama (with OpenAI compatibility), and many VS Code / JetBrains extensions that offer "Custom OpenAI" or "OpenAI-compatible" providers.

Deploying the Docs

To host this documentation on the web:

Cloudflare Pages: Connect the docs-web folder to a Pages project, or run npx wrangler pages deploy docs-web --project-name=your-docs.
GitHub Pages: Push docs-web to a repo and enable Pages for that folder.
Any static host: Upload the contents of docs-web to Netlify, Vercel, or any static hosting service.