Workers AI Chat API

Streaming chat and code generation powered by Cloudflare Workers AI. OpenAI-compatible for use in Cursor, Continue.dev, and other AI IDEs.

Overview

This API exposes two backends with the same contract:

Both support streaming SSE, model selection, and an OpenAI-compatible endpoint (/v1/chat/completions) for code generation in IDEs.

Base URL

https://api-ai.masiting.dev

Quick Start

# List available models
curl "https://api-ai.masiting.dev/api/models"

# Chat (streaming)
curl -N "https://api-ai.masiting.dev/api/chat" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"Say hello","stream":true}'

# OpenAI-compatible code generation
curl -N "https://api-ai.masiting.dev/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"Write a hello world in Python"}],"stream":true}'

POST /api/chat

Custom chat endpoint with prompt or messages. Returns SSE with {"delta":"..."} chunks and {"done":true} at the end.

POST/api/chat

Request body

{
  "prompt": "Explain recursion in one sentence.",
  "messages": [{"role": "user", "content": "..."}],
  "model": "@cf/meta/llama-3.1-8b-instruct",
  "stream": true,
  "max_tokens": 512,
  "temperature": 0.7
}

Use either prompt or messages. Set stream: false for a single JSON response.

GET /api/models

Returns supported models and the default model ID.

GET/api/models
{
  "defaultModel": "@cf/meta/llama-3.1-8b-instruct",
  "models": [
    {"id": "@cf/meta/llama-3.1-8b-instruct", "label": "Llama 3.1 8B Instruct", "description": "..."},
    {"id": "@hf/thebloke/deepseek-coder-6.7b-instruct-awq", "label": "DeepSeek Coder 6.7B Instruct", "description": "Code generation and editing."}
  ]
}

POST /v1/chat/completions

OpenAI Chat Completions compatible. Use this endpoint for Cursor, Continue.dev, and other OpenAI-compatible tools. Defaults to the code model when model is omitted.

POST/v1/chat/completions

Request body

{
  "model": "@hf/thebloke/deepseek-coder-6.7b-instruct-awq",
  "messages": [
    {"role": "system", "content": "You are a code assistant."},
    {"role": "user", "content": "Write a TypeScript function that adds two numbers."}
  ],
  "stream": true,
  "temperature": 0.4,
  "max_tokens": 512
}

Streaming response

SSE events with object: "chat.completion.chunk". Each chunk has choices[0].delta.content. Final chunk has finish_reason: "stop", followed by data: [DONE].

GET /health

GET/health

Returns {"ok": true}.

Integrate with Cursor

  1. Open Cursor and click the Settings icon (gear) in the bottom-left.
  2. Go to Models in the settings menu.
  3. Scroll to OpenAI API Keys.
  4. Enter your API key (if you add auth to this API) or a placeholder like not-needed if your endpoint has no auth.
  5. Enable Override OpenAI Base URL and set it to:
    https://api-ai.masiting.dev

    Do not include /v1 or /v1/chat/completions – Cursor appends the path automatically.

  6. Choose a model from the dropdown or enter a model ID manually, e.g. @hf/thebloke/deepseek-coder-6.7b-instruct-awq.
  7. Save and use Cursor as usual. Composer and chat will route through your Workers AI endpoint.
Note: Tab completion and some features may behave differently with custom endpoints. If you encounter issues, try the default models first.

Integrate with Continue.dev

  1. Install the Continue extension in VS Code.
  2. Open the Continue config file: ~/.continue/config.json (or config.yaml).
  3. Add an OpenAI-compatible model with your base URL:
    {
      "models": [
        {
          "title": "Workers AI Code",
          "provider": "openai",
          "model": "@hf/thebloke/deepseek-coder-6.7b-instruct-awq",
          "apiBase": "https://api-ai.masiting.dev/v1",
          "apiKey": "not-needed"
        }
      ]
    }
  4. Select Local Config in Continue and choose your new model from the dropdown.

Claude Code

Claude Code (and the Claude VS Code extension) uses Anthropic's API format, not OpenAI's. This API is OpenAI Chat Completions compatible, so it does not work directly with Claude Code's ANTHROPIC_BASE_URL.

Options:

Integrate with Kilo Code (VS Code)

Kilo Code supports OpenAI-compatible providers. This API returns CORS headers and GET /v1/models so the extension can list models.

  1. Install the Kilo Code extension from the VS Code marketplace.
  2. Open Kilo Code settings (click the Kilo icon) and choose Use your own API key.
  3. Select provider: OpenAI Compatible.
  4. Set Base URL to: https://api-ai.masiting.dev/v1 (include /v1 so Kilo does not double-append paths).
  5. Set API Key to any placeholder (e.g. not-needed) if you have not enabled auth on this API.
  6. Set Model to one of: gpt-5.4, gpt-4, gpt-3.5-turbo, deepseek-coder, qwen-coder, or sqlcoder. If the model list loads, pick from the dropdown; otherwise type the model ID.
  7. Save and run a task. Chat/completions will use POST /v1/chat/completions.
No models listed? Ensure Base URL is exactly https://api-ai.masiting.dev/v1. Kilo fetches GET /v1/models from that base; this API supports it and returns CORS headers for browser/webview requests.

Other IDEs & Tools

Any tool that supports an OpenAI-compatible API can use this endpoint. Typical configuration:

SettingValue
Base URLhttps://api-ai.masiting.dev
API path/v1/chat/completions (often auto-appended)
Model@hf/thebloke/deepseek-coder-6.7b-instruct-awq or any from GET /api/models
API keyLeave empty or use your key if you add auth

Examples of OpenAI-compatible tools: Open WebUI, LocalAI, LiteLLM, Ollama (with OpenAI compatibility), and many VS Code / JetBrains extensions that offer "Custom OpenAI" or "OpenAI-compatible" providers.

Deploying the Docs

To host this documentation on the web: