Splitter - recursive
Overview
Recursive splitter is an implementation of the Document Transformer interface that recursively splits long documents into smaller chunks by target size. It follows Eino: Document Transformer Guide.
How It Works
- Try splitting the document using separators in order
- If the current separator cannot produce chunks under the target size, use the next separator
- Merge split fragments to ensure sizes are close to the target
- Maintain a specified overlap area during merging
Usage
Initialization
Initialize via NewSplitter with configuration:
splitter, err := recursive.NewSplitter(ctx, &recursive.Config{
ChunkSize: 1000, // required: target chunk size
OverlapSize: 200, // optional: overlap size
Separators: []string{"\n\n", "\n", "。", "!", "?"}, // optional: separator list
LenFunc: nil, // optional: custom length func
KeepType: recursive.KeepTypeEnd, // optional: keep-type strategy
})
Parameters:
ChunkSize: required, target chunk sizeOverlapSize: overlap between chunks to keep contextSeparators: ordered list of separators by priorityLenFunc: custom length function, defaultlen()KeepType: separator keep strategy, values:KeepTypeNone: do not keep separatorsKeepTypeStart: keep at the startKeepTypeEnd: keep at the end
Complete Example
package main
import (
"context"
"github.com/cloudwego/eino-ext/components/document/transformer/splitter/recursive"
"github.com/cloudwego/eino/schema"
)
func main() {
ctx := context.Background()
splitter, err := recursive.NewSplitter(ctx, &recursive.Config{
ChunkSize: 1000,
OverlapSize: 200,
Separators: []string{"\n\n", "\n", "。", "!", "?"},
KeepType: recursive.KeepTypeEnd,
})
if err != nil { panic(err) }
docs := []*schema.Document{{
ID: "doc1",
Content: `This is the first paragraph, with some content.
This is the second paragraph. This paragraph has multiple sentences! These sentences are separated by punctuation.
This is the third paragraph. Here is more content.`,
}}
results, err := splitter.Transform(ctx, docs)
if err != nil { panic(err) }
for i, doc := range results { println("fragment", i+1, ":", doc.Content) }
}
Advanced Usage
Custom length function:
splitter, err := recursive.NewSplitter(ctx, &recursive.Config{
ChunkSize: 1000,
LenFunc: func(s string) int {
// use unicode rune count instead of byte length
return len([]rune(s))
},
})
Adjust overlap strategy:
splitter, err := recursive.NewSplitter(ctx, &recursive.Config{
ChunkSize: 1000,
OverlapSize: 300, // larger overlap to keep more context
KeepType: recursive.KeepTypeEnd, // keep separator at end of fragments
})
Custom separators:
splitter, err := recursive.NewSplitter(ctx, &recursive.Config{
ChunkSize: 1000,
Separators: []string{
"\n\n", // blank line (paragraph)
"\n", // newline
"。", // period
},
})
References
Last modified
December 16, 2025
: fix: improve readability of websocket and swagger docs (#1480) (f63ff55)