Translation quality

Vertana provides several features to improve translation quality beyond a single LLM pass. This guide covers document chunking, quality evaluation, iterative refinement, and best-of-N model selection.

Document chunking

Long documents are split into smaller chunks for translation. This improves quality by keeping each translation request within the model's optimal context window and allowing for chunk-level quality control.

Why chunking matters

LLMs have context limits and tend to perform better with focused input. Chunking:

Keeps each request within optimal token limits
Enables progress tracking for long documents
Allows quality evaluation and refinement per chunk
Enables parallel translation of independent chunks

Media type-aware chunking

Vertana selects chunkers based on the document's media type:

text/plain: Splits on paragraph boundaries and sentences.
text/markdown: Preserves Markdown structure: headings, code blocks, lists.
text/html: Preserves HTML tags and structure.

import { translate
 } from "@vertana/facade";
import { openai
 } from "@ai-sdk/openai";

const result
 = await translate
(
  openai
("gpt-4o"),
  "ko",
  markdownDocument
,
  { mediaType
: "text/markdown" }
);

Controlling chunk size

The context window option controls chunk size:

import { translate
 } from "@vertana/facade";
import { openai
 } from "@ai-sdk/openai";

const result
 = await translate
(
  openai
("gpt-4o"),
  "ko",
  longDocument
,
  {
    contextWindow
: {
      type
: "explicit",
      maxTokens
: 4096,  // Tokens per chunk
    },
  }
);

Disabling chunking

For short texts or when you want full-document context:

import { translate
 } from "@vertana/facade";
import { openai
 } from "@ai-sdk/openai";

const result
 = await translate
(
  openai
("gpt-4o"),
  "ko",
  shortText
,
  { chunker
: null }
);

Translation evaluation

Vertana can evaluate translation quality using an LLM judge. Evaluation produces a score and identifies specific issues.

Evaluation criteria

Translations are evaluated on four dimensions:

accuracy: Does the translation accurately convey the original meaning?
fluency: Is the translation natural and readable in the target language?
terminology: Are domain-specific terms translated correctly and consistently?
style: Does the translation maintain the appropriate tone?

Evaluation result structure

Evaluation returns a score and a list of issues:

import type { EvaluationResult, TranslationIssue } from "@vertana/core/evaluation";

const result
: EvaluationResult = {
  score
: 0.85,  // 0-1, where 1 is perfect
  issues
: [
    {
      type
: "terminology",
      description
: "Term 'chunk' inconsistently translated",
      location
: { start
: 45, end
: 52 },
    },
  ],
};

Score interpretation

Score	Quality
0.9+	Excellent
0.7-0.9	Good
0.5-0.7	Acceptable
< 0.5	Poor

The refinement feature iteratively improves translations through an evaluate-fix loop. Each iteration identifies issues and produces a revised translation until the quality threshold is met.

Enable refinement with the refinement option:

import { translate
 } from "@vertana/facade";
import { openai
 } from "@ai-sdk/openai";

const result
 = await translate
(
  openai
("gpt-4o"),
  "ko",
  document
,
  { refinement
: true }
);

console
.log
(result
.qualityScore
);         // Final quality score
console
.log
(result
.refinementIterations
); // Number of iterations performed

Fine-tune the refinement process with RefinementOptions:

import { translate
 } from "@vertana/facade";
import { openai
 } from "@ai-sdk/openai";

const result
 = await translate
(
  openai
("gpt-4o"),
  "ko",
  document
,
  {
    refinement
: {
      maxIterations
: 5,       // Maximum refinement passes (default: 3)
      qualityThreshold
: 0.95, // Stop when score exceeds this (default: 0.9)
    },
  }
);

maxIterations: Maximum number of refinement attempts per chunk (default: 3).
qualityThreshold: Target quality score; refinement stops when reached (default: 0.9).

Translate the document (or chunk)
Evaluate the translation quality
If score < threshold and iterations < max:
- Identify specific issues
- Generate improved translation addressing those issues
- Re-evaluate
- Repeat
Return the final refined translation

Best-of-N selection

When you provide multiple language models, Vertana can generate translations from each and automatically select the best one.

Basic usage

Pass an array of models and enable best-of-N:

import { translate
 } from "@vertana/facade";
import { openai
 } from "@ai-sdk/openai";
import { anthropic
 } from "@ai-sdk/anthropic";

const result
 = await translate
(
  [openai
("gpt-4o"), anthropic
("claude-sonnet-4-20250514")],
  "ko",
  document
,
  { bestOfN
: true }
);

console
.log
(result
.selectedModel
); // The model that produced the best translation

Specifying an evaluator model

By default, the first model evaluates all candidates. Specify a different evaluator:

import { translate
 } from "@vertana/facade";
import { openai
 } from "@ai-sdk/openai";
import { anthropic
 } from "@ai-sdk/anthropic";
import { google
 } from "@ai-sdk/google";

const result
 = await translate
(
  [openai
("gpt-4o"), anthropic
("claude-sonnet-4-20250514")],
  "ko",
  document
,
  {
    bestOfN
: {
      evaluatorModel
: google
("gemini-2.0-flash"),
    },
  }
);

How selection works

Each model translates the text independently
The evaluator model scores each translation
Translations are ranked by score
The highest-scoring translation is selected
result.selectedModel indicates which model produced it

Combining features

These features can be combined for maximum quality:

import { translate
 } from "@vertana/facade";
import { openai
 } from "@ai-sdk/openai";
import { anthropic
 } from "@ai-sdk/anthropic";

const result
 = await translate
(
  [openai
("gpt-4o"), anthropic
("claude-sonnet-4-20250514")],
  "ko",
  document
,
  {
    mediaType
: "text/markdown",
    glossary
: [
      { original
: "Vertana", translated
: "버타나" },
    ],
    dynamicGlossary
: true,
    refinement
: {
      maxIterations
: 3,
      qualityThreshold
: 0.9,
    },
    bestOfN
: true,
    onProgress
: (progress
) => {
      console
.log
(`${progress
.stage
}: ${(progress
.progress
 * 100).toFixed
(0)}%`);
    },
  }
);

console
.log
("Quality score:", result
.qualityScore
);
console
.log
("Selected model:", result
.selectedModel
);
console
.log
("Refinement iterations:", result
.refinementIterations
);
console
.log
("Accumulated terms:", result
.accumulatedGlossary
?.length
);

The processing order is:

Chunk the document
For each chunk:
- Gather context from sources
- Translate with each model
- Evaluate and select best
- Refine if below threshold
- Extract terms for dynamic glossary
Combine refined chunks

Tracking progress

Monitor the translation process with the onProgress callback:

import type { TranslationProgress
 } from "@vertana/facade";

function handleProgress
(progress
: TranslationProgress
): void {
  switch (progress
.stage
) {
    case "chunking":
      console
.log
("Splitting document into chunks...");
      break;
    case "gatheringContext":
      console
.log
("Gathering context from sources...");
      break;
    case "translating":
      if (progress
.totalChunks
) {
        console
.log
(`Translating chunk ${progress
.chunkIndex
! + 1}/${progress
.totalChunks
}`);
      }
      break;
    case "refining":
      console
.log
(`Refining: iteration ${progress
.iteration
}/${progress
.maxIterations
}`);
      break;
    case "selecting":
      console
.log
(`Evaluating candidate ${progress
.candidateIndex
! + 1}/${progress
.totalCandidates
}`);
      break;
  }
}

Translation quality

Document chunking

Why chunking matters

Media type-aware chunking

Controlling chunk size

Disabling chunking

Translation evaluation

Evaluation criteria

Evaluation result structure

Score interpretation

Iterative refinement

Enabling refinement

Customizing refinement

How refinement works

Best-of-N selection

Basic usage

Specifying an evaluator model

How selection works

Combining features

Tracking progress

Translation quality ​

Document chunking ​

Why chunking matters ​

Media type-aware chunking ​

Controlling chunk size ​

Disabling chunking ​

Translation evaluation ​

Evaluation criteria ​

Evaluation result structure ​

Score interpretation ​

Iterative refinement ​

Enabling refinement ​

Customizing refinement ​

How refinement works ​

Best-of-N selection ​

Basic usage ​

Specifying an evaluator model ​

How selection works ​

Combining features ​

Tracking progress ​

Translation quality

Document chunking

Why chunking matters

Media type-aware chunking

Controlling chunk size

Disabling chunking

Translation evaluation

Evaluation criteria

Evaluation result structure

Score interpretation

Iterative refinement

Enabling refinement

Customizing refinement

How refinement works

Best-of-N selection

Basic usage

Specifying an evaluator model

How selection works

Combining features

Tracking progress