Why We Built Our Own LLM Abstraction Layer

D-SKET Canvas was multi-vendor from day one. Whether a user picks Claude, OpenAI, or Gemini, it had to behave the same way. We thought this was a one-time library swap, but six months later we are running an 800-line abstraction layer of our own.

It was simple at first. There was the Vercel AI SDK, plus each vendor’s official SDK — so we figured we just needed to unify the interface. And for the first week, that was true. A single generateText({ model, prompt }) call worked across all three vendors.

Problem 1: Response Formats

To generate diagrams, we have to force JSON responses. A prompt like “make 5 nodes and 4 edges” has to come back as an exact JSON schema for the canvas to render it.

But the JSON-mode behavior across the three vendors was completely different:

Claude enforces it via the tools parameter — the strictest and most reliable
OpenAI needs response_format: { type: 'json_object' } plus explicit instruction in the system prompt
Gemini uses generationConfig.responseMimeType: 'application/json'

We managed to unify all of this. The real problem blew up with streaming.

Problem 2: Streaming Tokens Mean Different Things

Receiving tokens in real time over SSE makes the user experience much better. For diagram generation, we can display each node on the canvas the moment it is created.

But what a stream chunk means differs from vendor to vendor.

An OpenAI stream chunk is usually 1–3 tokens. Claude bundles them into semantic units and sends long bursts at once. Gemini throws them out roughly a line at a time. To produce the same “typing-like effect” at a consistent speed for the user, we needed different buffering logic per vendor.

— Jo Bugeon (AI Engine Lead)

Problem 3: Errors Have Different Shapes

This was the most frustrating part. The same kind of error (rate limit, token overflow, content policy violation) came back with completely different response codes and message structures.

// OpenAI: HTTP 429
{ "error": { "type": "rate_limit_exceeded", "code": "rate_limit", ... } }

// Claude: HTTP 429
{ "type": "error", "error": { "type": "rate_limit_error", ... } }

// Gemini: HTTP 200 (!) but with error embedded
{ "promptFeedback": { "blockReason": "SAFETY" } }

Gemini returning a 200 while embedding the error in the response body was genuinely surprising. We had to add logic to validate one more time in an axios interceptor.

So We Built Our Own Layer

Over three weeks we abstracted the following:

Request adapter — a unified chat({ messages, schema, stream }) interface
Response normalization — every vendor’s response reshaped into the same form (delta token, function call, finish reason)
Error classification — sorted into 7 error types (rate, token, policy, network, auth, parse, unknown)
Buffering policy — auto-adjusted so it “types at a similar speed” on the user’s screen

Why we didn't just use the Vercel AI SDK as-is: The SDK itself is excellent, but to handle BYOK (customer API keys) and reliability in the Korean user environment (especially Gemini KR region errors), we needed a layer closer to our business logic. The structure is our adapter sitting on top of the SDK.

What We Are Getting Now

After a few months of operation, the biggest benefit of our own abstraction is that “swapping models is a single environment variable.” When a user downgrades from the Pro plan to Free, it automatically falls back to a cheaper model. When a specific vendor has an outage, it automatically routes to another vendor.

One more thing: we can now manage BYOK users’ API-key validation logic in one place. When a user enters an invalid key, we guide them with the same error message no matter which vendor it is.

What’s Next

Next quarter we are adding an adapter for on-premise LLMs (Llama, Qwen, etc.). More security-sensitive enterprise customers want to use in-house GPU servers. By unifying everything behind the same adapter interface, we can make the difference invisible to the user.

For what it’s worth, this abstraction layer is reused not only in D-SKET Canvas but directly in our consulting projects too. When we attach AI features to a system we built through SI, the same code goes in — which is part of why running our own SaaS pays off here as well.