Glossary deep dive
Glossaries ensure consistent translation of specific terms throughout your documents. This guide covers all glossary features, from basic usage to dynamic extraction.
Why use glossaries?
Without glossaries, the same term might be translated differently in different parts of a document:
- "chunk" might become "청크" in one paragraph and "덩어리" in another
- Product names might be inconsistently transliterated
- Technical terms might lose their precise meaning
Glossaries solve this by defining explicit term mappings that Vertana applies consistently throughout the translation.
Basic usage
Provide a glossary through the glossary option:
import { translate } from "@vertana/facade";
import { openai } from "@ai-sdk/openai";
const result = await translate(
openai("gpt-4o"),
"ko",
"Vertana uses an agentic workflow to improve translation quality.",
{
glossary: [
{ original: "Vertana", translated: "버타나" },
{ original: "agentic", translated: "에이전틱" },
],
}
);Glossary entry structure
Each glossary entry has three fields:
import type { GlossaryEntry } from "@vertana/core/glossary";
const entry: GlossaryEntry = {
original: "chunk",
translated: "청크",
context: "In the context of text processing",
};original- The term in the source language to match.
translated- The translation to use in the target language.
context- Optional description of when this translation applies. This helps disambiguate terms with multiple meanings.
Context-aware entries
The context field helps when a term has multiple valid translations depending on usage:
import { translate } from "@vertana/facade";
import { openai } from "@ai-sdk/openai";
const result = await translate(
openai("gpt-4o"),
"ko",
"The bank processes transactions. We walked along the river bank.",
{
glossary: [
{
original: "bank",
translated: "은행",
context: "Financial institution",
},
{
original: "bank",
translated: "둑",
context: "Edge of a river",
},
],
}
);The LLM uses the context to choose the appropriate translation based on how the term is used in each sentence.
Dynamic glossary
For long documents, manually defining every term can be impractical. The dynamic glossary feature automatically extracts and accumulates terminology as translation progresses.
Enable it with the dynamicGlossary option:
import { translate } from "@vertana/facade";
import { openai } from "@ai-sdk/openai";
const result = await translate(
openai("gpt-4o"),
"ko",
longDocument,
{ dynamicGlossary: true }
);
// Access the accumulated glossary
console.log(result.accumulatedGlossary);
// [
// { original: "Vertana", translated: "버타나" },
// { original: "chunk", translated: "청크" },
// ...
// ]How it works
- After translating each chunk, Vertana extracts key terminology pairs
- Extracted terms are added to a running glossary
- Subsequent chunks receive the accumulated glossary, ensuring consistency
- The final glossary is returned in
result.accumulatedGlossary
Customizing dynamic glossary
Fine-tune the extraction with DynamicGlossaryOptions:
import { translate } from "@vertana/facade";
import { openai } from "@ai-sdk/openai";
const result = await translate(
openai("gpt-4o"),
"ko",
longDocument,
{
dynamicGlossary: {
maxTermsPerChunk: 15, // Extract more terms per chunk
extractorModel: extractorModel, // Use a different model for extraction
},
}
);maxTermsPerChunk- Maximum terms to extract from each chunk (default:
10). extractorModel- The model to use for term extraction. If not specified, uses the same model as translation.
Combining static and dynamic glossaries
You can provide an initial glossary while also enabling dynamic accumulation:
import { translate } from "@vertana/facade";
import { openai } from "@ai-sdk/openai";
const result = await translate(
openai("gpt-4o"),
"ko",
longDocument,
{
glossary: [
{ original: "Vertana", translated: "버타나" },
],
dynamicGlossary: true,
}
);
// result.accumulatedGlossary includes both the initial entry
// and any new terms extracted during translationGlossary file format
For the CLI and reusable glossaries, store entries in a JSON file:
[
{
"original": "Vertana",
"translated": "버타나"
},
{
"original": "agentic workflow",
"translated": "에이전틱 워크플로우"
},
{
"original": "chunk",
"translated": "청크",
"context": "A segment of text for processing"
}
]Use with the CLI:
vertana translate -t ko --glossary-file glossary.json document.mdOr load and use programmatically:
import { readFileSync } from "node:fs";
import type { Glossary } from "@vertana/core/glossary";
const glossary: Glossary = JSON.parse(
readFileSync("glossary.json", "utf-8")
);Best practices
Focus on important terms
Don't include every word. Focus on:
- Product names and brand terms
- Technical vocabulary specific to your domain
- Terms with multiple possible translations
- Proper nouns that need consistent transliteration
Provide context for ambiguous terms
When a term can mean different things, add context:
import type { Glossary } from "@vertana/core/glossary";
const glossary: Glossary = [
{
original: "model",
translated: "모델",
context: "Machine learning model",
},
{
original: "model",
translated: "모형",
context: "Physical or conceptual representation",
},
];Use dynamic glossary for long documents
For documents with many chunks, enable dynamic glossary to automatically maintain consistency without manually defining every term.
Reuse glossaries across projects
Save your domain-specific glossaries as JSON files and reuse them across related projects to maintain consistent terminology.