Eino: Document Transformer Guide
Overview
Document Transformer is a component for transforming and processing documents. It performs operations such as splitting, filtering, merging, and more to produce documents tailored to specific needs. Typical scenarios include:
- Split long documents into smaller chunks for processing
- Filter document content by rules
- Convert document structure
- Extract specific parts of a document
Component Definition
Interface
Code:
eino/components/document/interface.go
type Transformer interface {
Transform(ctx context.Context, src []*schema.Document, opts ...TransformerOption) ([]*schema.Document, error)
}
Transform Method
- Purpose: transform input documents
- Params:
ctx: request context, also carries the Callback Managersrc: documents to processopts: options to configure behavior
- Returns:
[]*schema.Document: transformed documentserror: error during transformation
Document Struct
type Document struct {
// ID is the unique identifier
ID string
// Content is the document text
Content string
// MetaData stores metadata
MetaData map[string]any
}
Key fields:
- ID: unique identifier
- Content: actual text
- MetaData: metadata such as:
- source info
- vector representation (for retrieval)
- score (for ranking)
- sub-index (for hierarchical retrieval)
- other custom metadata
Common Options
Transformer uses TransformerOption for optional parameters. There are currently no global/common options; each implementation defines its own specific options, wrapped via WrapTransformerImplSpecificOptFn into TransformerOption.
Usage
Standalone
Code:
eino-ext/components/document/transformer/splitter/markdown/examples/headersplitter
import (
"github.com/cloudwego/eino/schema"
"github.com/cloudwego/eino-ext/components/document/transformer/splitter/markdown"
)
// init transformer (markdown example)
transformer, _ := markdown.NewHeaderSplitter(ctx, &markdown.HeaderConfig{
Headers: map[string]string{
"##": "",
},
})
markdownDoc := &schema.Document{
Content: "## Title 1\nHello Word\n## Title 2\nWord Hello",
}
// transform
transformedDocs, _ := transformer.Transform(ctx, []*schema.Document{markdownDoc})
for idx, doc := range transformedDocs {
log.Printf("doc segment %v: %v", idx, doc.Content)
}
In Orchestration
// in Chain
chain := compose.NewChain[[]*schema.Document, []*schema.Document]()
chain.AppendDocumentTransformer(transformer)
// in Graph
graph := compose.NewGraph[[]*schema.Document, []*schema.Document]()
graph.AddDocumentTransformerNode("transformer_node", transformer)
Options and Callbacks
Callback Example
Code:
eino-ext/components/document/transformer/splitter/markdown/examples/headersplitter
import (
"github.com/cloudwego/eino/callbacks"
"github.com/cloudwego/eino/components/document"
"github.com/cloudwego/eino/compose"
"github.com/cloudwego/eino/schema"
callbacksHelper "github.com/cloudwego/eino/utils/callbacks"
"github.com/cloudwego/eino-ext/components/document/transformer/splitter/markdown"
)
handler := &callbacksHelper.TransformerCallbackHandler{
OnStart: func(ctx context.Context, info *callbacks.RunInfo, input *document.TransformerCallbackInput) context.Context {
log.Printf("input access, len: %v, content: %s\n", len(input.Input), input.Input[0].Content)
return ctx
},
OnEnd: func(ctx context.Context, info *callbacks.RunInfo, output *document.TransformerCallbackOutput) context.Context {
log.Printf("output finished, len: %v\n", len(output.Output))
return ctx
},
}
helper := callbacksHelper.NewHandlerHelper().
Transformer(handler).
Handler()
chain := compose.NewChain[[]*schema.Document, []*schema.Document]()
chain.AppendDocumentTransformer(transformer)
run, _ := chain.Compile(ctx)
outDocs, _ := run.Invoke(ctx, []*schema.Document{markdownDoc}, compose.WithCallbacks(helper))
for idx, doc := range outDocs {
log.Printf("doc segment %v: %v", idx, doc.Content)
}
Existing Implementations
- Markdown Header Splitter: split by markdown headers — Splitter - markdown
- Text Splitter: split by length or separators — Splitter - semantic
- Document Filter: filter by rules — Splitter - recursive
Implement Your Own
Consider the following when implementing a custom Transformer:
- Option handling
- Callback handling
Option Mechanism
type MyTransformerOptions struct {
ChunkSize int
Overlap int
MinChunkLength int
}
func WithChunkSize(size int) document.TransformerOption {
return document.WrapTransformerImplSpecificOptFn(func(o *MyTransformerOptions) {
o.ChunkSize = size
})
}
func WithOverlap(overlap int) document.TransformerOption {
return document.WrapTransformerImplSpecificOptFn(func(o *MyTransformerOptions) {
o.Overlap = overlap
})
}
Callback Handling
type TransformerCallbackInput struct {
Input []*schema.Document
Extra map[string]any
}
type TransformerCallbackOutput struct {
Output []*schema.Document
Extra map[string]any
}
Full Implementation Example
type MyTransformer struct {
chunkSize int
overlap int
minChunkLength int
}
func NewMyTransformer(config *MyTransformerConfig) (*MyTransformer, error) {
return &MyTransformer{
chunkSize: config.DefaultChunkSize,
overlap: config.DefaultOverlap,
minChunkLength: config.DefaultMinChunkLength,
}, nil
}
func (t *MyTransformer) Transform(ctx context.Context, src []*schema.Document, opts ...document.TransformerOption) ([]*schema.Document, error) {
// 1. handle Option
options := &MyTransformerOptions{
ChunkSize: t.chunkSize,
Overlap: t.overlap,
MinChunkLength: t.minChunkLength,
}
options = document.GetTransformerImplSpecificOptions(options, opts...)
// 2. before-transform callback
ctx = callbacks.OnStart(ctx, info, &document.TransformerCallbackInput{
Input: src,
})
// 3. perform transform
docs, err := t.doTransform(ctx, src, options)
// 4. handle error and finish callback
if err != nil {
ctx = callbacks.OnError(ctx, info, err)
return nil, err
}
ctx = callbacks.OnEnd(ctx, info, &document.TransformerCallbackOutput{
Output: docs,
})
return docs, nil
}
func (t *MyTransformer) doTransform(ctx context.Context, src []*schema.Document, opts *MyTransformerOptions) ([]*schema.Document, error) {
// implement document transform logic
return docs, nil
}
Notes
- Preserve and manage metadata when transforming documents; keep original metadata and add custom metadata as needed.
References
Last modified
December 16, 2025
: fix: improve readability of websocket and swagger docs (#1480) (f63ff55)