Skip to content

Performance Tuning

XLFill ships with three processing modes and a compiled template API. Choose the right combination for your workload, or let XLFill decide automatically.

ModeBest forSpeedupMemory savingsTradeoffs
Sequential (default)< 10K rowsBaselineBaselineNone — full feature support
Streaming> 10K rows, simple templates3x faster60% less memoryNo formula remapping, no images, no hyperlinks
Parallel> 100 rows, CPU-bound expressionsScales with coresSimilar to sequentialFixed-height areas only, mutex overhead

Streaming writes output rows incrementally via excelize’s StreamWriter instead of holding the entire workbook in memory. Ideal for large, formula-free data exports.

xlfill.Fill("template.xlsx", "report.xlsx", data,
xlfill.WithStreaming(true),
)

Benchmark (1,000 rows):

SequentialStreaming
Time27.5ms8.9ms
Memory8.3 MB3.3 MB
Allocs110K43K

Limitations:

  • Formula reference remapping is skipped (formulas are written verbatim)
  • Hyperlinks are silently written as plain text
  • Images are not supported (returns error)
  • Single-sheet output only
  • Rows must be written in ascending order (guaranteed by template processing)

Parallel mode runs jx:each iterations concurrently using goroutines. Each goroutine gets an independent context clone and a pre-computed row offset.

xlfill.Fill("template.xlsx", "report.xlsx", data,
xlfill.WithParallelism(4), // 4 goroutines
)

When it kicks in:

  • Direction must be DOWN (column-wise expansion can’t be pre-offset)
  • Area must be fixed-height (no nested jx:each or jx:repeat that change output height)
  • Item count must be >= the parallelism value
  • Otherwise, falls back to sequential automatically — no error, no config change needed

Safety guarantees:

  • All Transformer writes go through a ConcurrentTransformer mutex wrapper
  • Each goroutine gets a Context.Clone() with independent evaluation state
  • Progress reporting uses atomic counters
  • Panics in goroutines are recovered and reported as errors
  • Cancellation propagates immediately via context.WithCancel

These are mutually exclusive. If both are set, parallel takes precedence. For truly massive outputs, use streaming (it wins on both speed and memory).

Instead of choosing manually, let XLFill analyze your template and pick the optimal mode:

xlfill.Fill("template.xlsx", "report.xlsx", data,
xlfill.WithAutoMode(map[string]any{
"itemCount": len(employees), // hint: how many items
}),
)

Decision logic:

Item countStreaming-eligible?Parallel-eligible?Result
>= 10,000YesStreaming
>= 100Yes (multi-core)Parallel (capped at 8 goroutines)
>= 1,000YesNoStreaming (fallback)
AnyNoNoSequential

Streaming blockers: formulas, images, hyperlinks, multisheet, direction=RIGHT Parallel blockers: nested each/repeat, multisheet, direction=RIGHT, no each commands

For more control, call SuggestMode directly and inspect the recommendation:

suggestion, err := xlfill.SuggestMode("template.xlsx", map[string]any{
"itemCount": 50000,
})
fmt.Println(suggestion.Mode) // "streaming"
fmt.Println(suggestion.Reasons) // ["large dataset (>=10K items)", "template is streaming-compatible"]
// Apply the suggestion
opts := suggestion.Apply()
xlfill.Fill("template.xlsx", "report.xlsx", data, opts...)

When generating the same report with different data (batch jobs, API endpoints, queue workers), parse the template once and reuse:

compiled, err := xlfill.Compile("template.xlsx",
xlfill.WithRecalculateOnOpen(true),
)
// Fill with different data sets — template bytes cached in memory
for _, dataset := range datasets {
compiled.Fill(dataset, fmt.Sprintf("report_%d.xlsx", i))
}

Each Fill call creates a fresh transformer from cached bytes — no file I/O. Options like streaming, parallel, auto-mode, and strict mode are all propagated to each fill.

For long-running fills, use Go’s standard context.Context for cancellation and timeouts:

ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
err := xlfill.Fill("template.xlsx", "report.xlsx", data,
xlfill.WithContext(ctx),
xlfill.WithProgressFunc(func(p xlfill.FillProgress) {
fmt.Printf("Processed %d rows on %s\n", p.ProcessedRows, p.CurrentSheet)
}),
)

Progress works with all modes — sequential, streaming, and parallel (using atomic counters for thread safety).

Several commands use deferred execution — they collect their configuration during template processing but apply their effects only after all rows are written. This is both a performance optimization and a correctness requirement: these commands need to know the final output row count to set correct ranges.

Deferred commands:

CommandWhy deferred
jx:tableTable range must cover all output rows
jx:chartChart data ranges must reference final row positions
jx:conditionalFormatFormat rules must span the entire output range
jx:groupOutline group ranges depend on final row positions
jx:definedNameNamed ranges must cover all output rows
jx:sparklineData ranges must reference final cell positions

How it works: During template processing, each deferred command records a DeferredAction with its template-relative area and attributes. After all rows are written, the engine replays these actions with adjusted row offsets. This means:

  • No wasted work if a jx:if excludes the area containing the deferred command
  • Correct ranges even with nested loops that expand to variable heights
  • Compatible with streaming mode (deferred actions run after the stream is finalized)

Performance impact: Deferred execution adds negligible overhead — typically < 1ms for a report with multiple tables, charts, and conditional formats. The alternative (applying during expansion and then re-adjusting ranges) would be both slower and more error-prone.

These happen automatically — no configuration needed:

OptimizationImpact
Differential context mapLoop variable updates modify the cached map in-place instead of rebuilding. Eliminates ~30K map copies for 10K rows.
Expression compilation cacheExpressions are compiled once via sync.Map. Subsequent evaluations hit the cache (~5M evals/sec).
Pre-allocated slicesComment and formula cell lists are pre-sized during template loading. Reduces GC pressure for large templates.
Atomic progress countersArea.rowsProcessed uses atomic.Int64 — safe for parallel mode with zero contention.
  1. Benchmark your actual template — the examples above use a simple 3-column template. Complex expressions, formulas, and nested loops change the equation.
  2. Streaming is the biggest win — if your template is compatible, streaming mode gives 3x speedup and 60% less memory with zero code changes.
  3. Auto-mode is safe — it only selects modes your template supports. No silent failures.
  4. Compile for batch — if you generate the same report more than once, Compile pays for itself on the second fill.
  5. Use context.Context — always set a timeout for server-side report generation to prevent runaway fills.

For raw benchmark numbers and scaling characteristics:

Performance Benchmarks →

For error handling and validation:

Error Handling →