Web context
The @vertana/context-web package provides context sources that fetch and extract content from web pages. This is useful when translating documents that reference external articles or resources.
Installation
deno add jsr:@vertana/context-webnpm add @vertana/context-webpnpm add @vertana/context-webyarn add @vertana/context-webbun add @vertana/context-webOverview
This package provides two main context sources:
fetchWebPage- A passive context source that fetches a single URL on demand. The LLM can call this tool when it needs additional context.
fetchLinkedPages- A required context source factory that extracts all links from the source text and fetches their content before translation begins.
Both use Mozilla's Readability algorithm to extract the main article content from web pages, filtering out navigation, ads, and other noise.
fetchWebPage
A passive context source that the LLM can invoke when it needs to fetch a specific URL.
import { translate } from "@vertana/facade";
import { fetchWebPage } from "@vertana/context-web";
const text = `
This article discusses the concept explained at https://example.com/guide.
`;
const result = await translate(model, "ko", text, {
contextSources: [fetchWebPage],
});When the LLM encounters a reference it wants to understand better, it can call the fetch-web-page tool with the URL to retrieve the page content.
fetchLinkedPages
A factory function that creates a required context source. It extracts all URLs from the source text and fetches their content before translation begins.
import { translate } from "@vertana/facade";
import { fetchLinkedPages } from "@vertana/context-web";
const text = `
Check out https://example.com/article for background.
Also see https://example.com/reference for more details.
`;
const result = await translate(model, "ko", text, {
contextSources: [
fetchLinkedPages({
text,
mediaType: "text/plain",
}),
],
});Options
text- The source text to extract links from.
mediaType- The media type of the text (
"text/plain","text/markdown", or"text/html"). This affects how links are extracted. maxLinks- Maximum number of links to fetch. Defaults to
10. timeout- Timeout for each fetch request in milliseconds. Defaults to
10000.
Combining both sources
For best results, use both sources together. fetchLinkedPages provides context from links in the source text, while fetchWebPage allows the LLM to fetch additional URLs it discovers during translation.
import { translate } from "@vertana/facade";
import { fetchLinkedPages, fetchWebPage } from "@vertana/context-web";
const text = `
Read the introduction at https://example.com/intro.
`;
const result = await translate(model, "ko", text, {
contextSources: [
// Pre-fetch all links in the text
fetchLinkedPages({ text, mediaType: "text/plain" }),
// Allow LLM to fetch additional URLs on demand
fetchWebPage,
],
});extractLinks utility
The extractLinks function extracts URLs from text. It's used internally by fetchLinkedPages but is also exported for custom use cases.
import { extractLinks } from "@vertana/context-web";
// From plain text
const plainUrls = extractLinks(
"Check https://example.com for info.",
"text/plain"
);
// => ["https://example.com"]
// From Markdown
const mdUrls = extractLinks(
"See [this article](https://example.com/article).",
"text/markdown"
);
// => ["https://example.com/article"]
// From HTML
const htmlUrls = extractLinks(
'<a href="https://example.com">Link</a>',
"text/html"
);
// => ["https://example.com"]CLI usage
The Vertana CLI includes the -L or --fetch-links flag that enables web context fetching:
vertana translate -t ko -L document.mdThis automatically:
- Extracts all links from the input document
- Fetches and extracts content from linked pages
- Provides the content as context for translation
See the CLI reference for more details.