D-SKET Canvas was multi-vendor from day one. Whether a user picks Claude, OpenAI, or Gemini, it had to behave the same way. We thought this was a one-time library swap, but six months later we are running an 800-line abstraction layer of our own.
It was simple at first. There was the Vercel AI SDK, plus each vendor’s official SDK — so we figured we just needed to unify the interface. And for the first week, that was true. A single generateText({ model, prompt }) call worked across all three vendors.
Problem 1: Response Formats
To generate diagrams, we have to force JSON responses. A prompt like “make 5 nodes and 4 edges” has to come back as an exact JSON schema for the canvas to render it.
But the JSON-mode behavior across the three vendors was completely different:
- Claude enforces it via the
toolsparameter — the strictest and most reliable - OpenAI needs
response_format: { type: 'json_object' }plus explicit instruction in the system prompt - Gemini uses
generationConfig.responseMimeType: 'application/json'
We managed to unify all of this. The real problem blew up with streaming.
Problem 2: Streaming Tokens Mean Different Things
Receiving tokens in real time over SSE makes the user experience much better. For diagram generation, we can display each node on the canvas the moment it is created.
But what a stream chunk means differs from vendor to vendor.
An OpenAI stream chunk is usually 1–3 tokens. Claude bundles them into semantic units and sends long bursts at once. Gemini throws them out roughly a line at a time. To produce the same “typing-like effect” at a consistent speed for the user, we needed different buffering logic per vendor.
— Jo Bugeon (AI Engine Lead)
Problem 3: Errors Have Different Shapes
This was the most frustrating part. The same kind of error (rate limit, token overflow, content policy violation) came back with completely different response codes and message structures.
// OpenAI: HTTP 429
{ "error": { "type": "rate_limit_exceeded", "code": "rate_limit", ... } }
// Claude: HTTP 429
{ "type": "error", "error": { "type": "rate_limit_error", ... } }
// Gemini: HTTP 200 (!) but with error embedded
{ "promptFeedback": { "blockReason": "SAFETY" } }
Gemini returning a 200 while embedding the error in the response body was genuinely surprising. We had to add logic to validate one more time in an axios interceptor.
So We Built Our Own Layer
Over three weeks we abstracted the following:
- Request adapter — a unified
chat({ messages, schema, stream })interface - Response normalization — every vendor’s response reshaped into the same form (delta token, function call, finish reason)
- Error classification — sorted into 7 error types (rate, token, policy, network, auth, parse, unknown)
- Buffering policy — auto-adjusted so it “types at a similar speed” on the user’s screen
What We Are Getting Now
After a few months of operation, the biggest benefit of our own abstraction is that “swapping models is a single environment variable.” When a user downgrades from the Pro plan to Free, it automatically falls back to a cheaper model. When a specific vendor has an outage, it automatically routes to another vendor.
One more thing: we can now manage BYOK users’ API-key validation logic in one place. When a user enters an invalid key, we guide them with the same error message no matter which vendor it is.
What’s Next
Next quarter we are adding an adapter for on-premise LLMs (Llama, Qwen, etc.). More security-sensitive enterprise customers want to use in-house GPU servers. By unifying everything behind the same adapter interface, we can make the difference invisible to the user.
For what it’s worth, this abstraction layer is reused not only in D-SKET Canvas but directly in our consulting projects too. When we attach AI features to a system we built through SI, the same code goes in — which is part of why running our own SaaS pays off here as well.