feat: Enhance tool calling, classification, and progress reporting by MichaelAnders · Pull Request #55 · Fast-Editor/Lynkr

MichaelAnders · 2026-02-22T13:54:42Z

CRITICAL: All test scripts now require NODE_ENV=test environment variable. This ensures tests run in test mode with proper isolation from production code paths (e.g., disables live API calls, mocking, test fixtures). See package.json test:* scripts.

Apply comprehensive improvements to Lynkr's tool execution pipeline, including per-model tool parsers (based on vLLM), LLM-based classification, real-time progress monitoring, and advanced agent routing with dual-provider support for cloud-based tool execution.

Key Changes

Test Infrastructure

NODE_ENV=test: Added to ALL test scripts (test:unit, test:memory, etc.)
- Ensures isolated test environment without production side effects
- Enables test fixtures and mocking frameworks
- Prevents accidental API calls during testing
- IMPORTANT: This is a breaking change if tests are run without this var

New Major Features

Tool Calling & Parsing (vLLM-Inspired)

Per-model tool parsers for GLM-4.7, Qwen3, and generic models
- Implementation follows vLLM's ToolParser hierarchy (Apache 2.0 license)
- GLM-4.7 parser: Handles native XML format + fallback patterns
- Qwen3 parser: Markdown extraction with robust error handling
- Generic parser: Extensible base for any model format
Ollama fallback handling for malformed responses
Tool call deduplication and cleaning

Dual-Provider Tool Execution

TOOL_EXECUTION_PROVIDER: Route tool calls to specialized providers
- Enables using cheap/fast/local models for chat while using reliable models (Claude Sonnet) for tool calling
- Reduces token usage and improves tool accuracy
TOOL_EXECUTION_COMPARE_MODE: Compare tool calls from both providers
OLLAMA_CLOUD_ENDPOINT: Support for cloud-based Ollama models
- Enables "cloud-only" setups without local Ollama
- Automatic routing: cloud models use cloud endpoint
- Hybrid support: mix local and cloud models in same session

Tool Classification

LLM-based tool needs classification (whitelist + LLM fallback)
Per-model classification accuracy with pattern matching
Tool execution provider routing based on classification
Workspace access permission system for external file operations

Progress Reporting (Real-Time Monitoring)

WebSocket server (port 8765) broadcasting execution events
Progress events: agent loop, model invocation, tool execution
Built-in Python listener (tools/progress-listener.py)
- Color-coded output with timestamps
- Agent hierarchy tracking (parent/child relationships)
- Token and duration metrics
- Remote monitoring support
Event tracking for debugging and observability

Configuration Enhancements

New environment variables (see .env.example for defaults):

OLLAMA_CLOUD_ENDPOINT: Cloud Ollama instance URL
OLLAMA_API_KEY: Cloud Ollama API authentication
TOOL_EXECUTION_PROVIDER: Provider for tool calling decisions
TOOL_EXECUTION_MODEL: Model override for tool execution
TOOL_EXECUTION_COMPARE_MODE: Enable provider comparison
POLICY_MAX_DURATION_MS: Single agent loop turn timeout
POLICY_TOOL_LOOP_THRESHOLD: Max tool results before termination
POLICY_MAX_TOOL_CALLS_PER_REQUEST: Parallel tool call limit
TOOL_NEEDS_CLASSIFICATION_*: Classification whitelist and LLM config

Files Changed

61 files modified (9,061 insertions, 432 deletions):

Test Infrastructure

package.json: NODE_ENV=test on all test:* scripts

Core Parser System (vLLM-Based)

src/parsers/base-tool-parser.js: Base class hierarchy
src/parsers/glm47-tool-parser.js: GLM-4.7 tool parsing
src/parsers/generic-tool-parser.js: Extensible generic parser
src/parsers/index.js: Parser registry and selection

Tool Execution & Classification

src/tools/tool-call-cleaner.js: Response cleanup and deduplication
src/tools/tool-classification-*.js: Classification system
src/agents/tool-agent-mapper.js: Tool-agent relationship mapping

Provider & Routing

src/clients/ollama-utils.js: Dual endpoint support (local + cloud)
src/api/router.js: Provider routing and conversion
src/providers/context-window.js: NEW - Context detection

Progress & Observability

src/progress/server.js: NEW - WebSocket server
src/progress/emitter.js: NEW - Event broadcasting
src/progress/client.js: NEW - Client monitoring
tools/progress-listener.py: NEW - Python listener tool

Configuration & Documentation

.env.example: OLLAMA_CLOUD_ENDPOINT, TOOL_EXECUTION_, POLICY_
config/tool-whitelist-*.json: Classification patterns

Tests (14 new files, 490/490 passing)

Tool parser tests (GLM, Qwen3, generic)
Tool classification and accuracy tests
Dual endpoint and cloud Ollama tests
Tool execution provider tests with comparison mode
Subagent auto-spawning tests
Progress reporting integration tests

Attribution

Per-model tool parsers: Based on vLLM's tool calling implementation (Apache License 2.0, https://github.com/vllm-project/vllm)
Progress reporting: Real-time WebSocket event system
Agent routing: Dual-provider architecture for cost optimization

on Ollama

CRITICAL: All test scripts now require NODE_ENV=test environment variable. This ensures tests run in test mode with proper isolation from production code paths (e.g., disables live API calls, mocking, test fixtures). See package.json test:* scripts. Apply comprehensive improvements to Lynkr's tool execution pipeline, including per-model tool parsers (based on vLLM), LLM-based classification, real-time progress monitoring, and advanced agent routing with dual-provider support for cloud-based tool execution. ## Key Changes ### Test Infrastructure - NODE_ENV=test: Added to ALL test scripts (test:unit, test:memory, etc.) * Ensures isolated test environment without production side effects * Enables test fixtures and mocking frameworks * Prevents accidental API calls during testing * IMPORTANT: This is a breaking change if tests are run without this var ## New Major Features ### Tool Calling & Parsing (vLLM-Inspired) - Per-model tool parsers for GLM-4.7, Qwen3, and generic models * Implementation follows vLLM's ToolParser hierarchy (Apache 2.0 license) * GLM-4.7 parser: Handles native XML format + fallback patterns * Qwen3 parser: Markdown extraction with robust error handling * Generic parser: Extensible base for any model format - Ollama fallback handling for malformed responses - Tool call deduplication and cleaning ### Dual-Provider Tool Execution - TOOL_EXECUTION_PROVIDER: Route tool calls to specialized providers * Enables using cheap/fast/local models for chat while using reliable models (Claude Sonnet) for tool calling * Reduces token usage and improves tool accuracy - TOOL_EXECUTION_COMPARE_MODE: Compare tool calls from both providers - OLLAMA_CLOUD_ENDPOINT: Support for cloud-based Ollama models * Enables "cloud-only" setups without local Ollama * Automatic routing: cloud models use cloud endpoint * Hybrid support: mix local and cloud models in same session ### Tool Classification - LLM-based tool needs classification (whitelist + LLM fallback) - Per-model classification accuracy with pattern matching - Tool execution provider routing based on classification - Workspace access permission system for external file operations ### Progress Reporting (Real-Time Monitoring) - WebSocket server (port 8765) broadcasting execution events - Progress events: agent loop, model invocation, tool execution - Built-in Python listener (tools/progress-listener.py) * Color-coded output with timestamps * Agent hierarchy tracking (parent/child relationships) * Token and duration metrics * Remote monitoring support - Event tracking for debugging and observability ## Configuration Enhancements New environment variables (see .env.example for defaults): - OLLAMA_CLOUD_ENDPOINT: Cloud Ollama instance URL - OLLAMA_API_KEY: Cloud Ollama API authentication - TOOL_EXECUTION_PROVIDER: Provider for tool calling decisions - TOOL_EXECUTION_MODEL: Model override for tool execution - TOOL_EXECUTION_COMPARE_MODE: Enable provider comparison - POLICY_MAX_DURATION_MS: Single agent loop turn timeout - POLICY_TOOL_LOOP_THRESHOLD: Max tool results before termination - POLICY_MAX_TOOL_CALLS_PER_REQUEST: Parallel tool call limit - TOOL_NEEDS_CLASSIFICATION_*: Classification whitelist and LLM config ## Files Changed 61 files modified (9,061 insertions, 432 deletions): ### Test Infrastructure - package.json: NODE_ENV=test on all test:* scripts ### Core Parser System (vLLM-Based) - src/parsers/base-tool-parser.js: Base class hierarchy - src/parsers/glm47-tool-parser.js: GLM-4.7 tool parsing - src/parsers/generic-tool-parser.js: Extensible generic parser - src/parsers/index.js: Parser registry and selection ### Tool Execution & Classification - src/tools/tool-call-cleaner.js: Response cleanup and deduplication - src/tools/tool-classification-*.js: Classification system - src/agents/tool-agent-mapper.js: Tool-agent relationship mapping ### Provider & Routing - src/clients/ollama-utils.js: Dual endpoint support (local + cloud) - src/api/router.js: Provider routing and conversion - src/providers/context-window.js: NEW - Context detection ### Progress & Observability - src/progress/server.js: NEW - WebSocket server - src/progress/emitter.js: NEW - Event broadcasting - src/progress/client.js: NEW - Client monitoring - tools/progress-listener.py: NEW - Python listener tool ### Configuration & Documentation - .env.example: OLLAMA_CLOUD_ENDPOINT, TOOL_EXECUTION_*, POLICY_* - config/tool-whitelist-*.json: Classification patterns ### Tests (14 new files, 490/490 passing) - Tool parser tests (GLM, Qwen3, generic) - Tool classification and accuracy tests - Dual endpoint and cloud Ollama tests - Tool execution provider tests with comparison mode - Subagent auto-spawning tests - Progress reporting integration tests ## Attribution - **Per-model tool parsers**: Based on vLLM's tool calling implementation (Apache License 2.0, https://github.com/vllm-project/vllm) - **Progress reporting**: Real-time WebSocket event system - **Agent routing**: Dual-provider architecture for cost optimization Co-Authored-By: Claude Haiku 4.5, Sonnet 4.6, Opus 4.6 <noreply@anthropic.com> Co-Authored-By: GLM-4.7-cloud <noreply@z.ai> on Ollama

MichaelAnders · 2026-02-22T14:11:42Z

Two things:

vLLM tool calling implementations are added - please adjust if needed the code to reflect it properly (I mentioned vLLM in "Attribution)
With these changes GLM-4.7 can be used for some code analysis&corrections - sometimes it gets stuck ("Let me do XYZ..." responses) which can be overcome with "do XYZ" - that is a WIP I'll look into (several ideas), but 1st this has to be merged. Then other models will be enabled using the vLLM parser implementations which are really good!

veerareddyvishal144 · 2026-02-23T05:31:52Z

Thanks @MichaelAnders for your contribution I am merging it.

MichaelAnders · 2026-02-23T12:35:35Z

Ok, so once you've merged e260fb2 into main branch I can contribute more fixes + enhancements.

veerareddyvishal144 · 2026-02-23T13:25:02Z

This is the issue I am running into. As of now tool calling in ollama works in the code on the main branch. Can you please fix it.

MichaelAnders · 2026-02-23T15:50:08Z

I will restore the previous behavior and use the new tool parsers (vllm based) only if they have been implemented.

So I went back to main-branch (old behaviour which you say worked for you with the prompts you supplied).

To reproduce: Which MODEL_PROVIDER are you using (I have openrouter available), what is your MODEL_DEFAULT etc.?

MichaelAnders · 2026-02-24T14:24:17Z

Ok, this is getting weird...

I cloned both main-branch and feature/model-router
I was able to reproduce the failing tool calls in feature/model-router
I changed feature/model-router and "if 1 == 0" 'd some of my code sections to see what gives

What I noticed when repeating the same tests multiple times:

I got horrible results even after commenting my code, but I got a response.
main-branch never worked.
The new "progress listener" showed huge context from previous prompts.

As I've experienced bad results very often due to the chat history being added automatically, I decided to get rid of this as a potential "pollution/noise" which we can never sync on - the old noise will always confuse the LLMs. To prevent that I added new code in src-server.js - feel free to add it to main branch, I think this will help everyone who report issues.

In my opinion we should also have an option to dump ALL used parameters at runtime to reproduce issues 1:1. For security reasons we need to be careful with API keys and never log them at all ;) So if someone would add that as well, cool! I have enough to work on with my feature/model-router for now.

`}

// Clear SQLite context databases BEFORE initializing
// Controlled by LYNKR_CLEAR_SQLITE_CONTEXT environment variable
if (process.env.LYNKR_CLEAR_SQLITE_CONTEXT === 'true') {
const dataDir = path.join(__dirname, '..', 'data');
const sqliteDatabases = ['sessions.db', 'lynkr.db', 'budgets.db', 'prompt-cache.db'];

try {
let deletedCount = 0;
for (const dbFile of sqliteDatabases) {
const dbPath = path.join(dataDir, dbFile);
if (fs.existsSync(dbPath)) {
fs.unlinkSync(dbPath);
deletedCount++;
}
}
if (deletedCount > 0) {
console.log([STARTUP] Cleared ${deletedCount} SQLite database file(s) from ${dataDir});
}
} catch (err) {
console.error([STARTUP] Failed to clear SQLite context: ${err.message});
}
}

const loggingMiddleware = require("./api/middleware/logging");`

Then I tried it again. The result was... surprising?

I find this interesting because on main branch, I am unable to reproduce the issue you ran into and instead get no response at all?
I do see this though in my logs - not a new issue, not sure if I fixed that in feature/model-router or before already (which then obviously didn't work):

Failed to parse tool arguments
    env: "development"
    err: {
      "type": "SyntaxError",
      "message": "Unexpected non-whitespace character after JSON at position 16 (line 1 column 17)",
      "stack":
          SyntaxError: Unexpected non-whitespace character after JSON at position 16 (line 1 column 17)
              at JSON.parse (<anonymous>)
              at parseArguments (/home/user/readd_old_tools/main_branch/src/tools/index.js:136:17)
              at normaliseToolCall (/home/user/readd_old_tools/main_branch/src/tools/index.js:149:16)

feature/model-router - along with at least this one regression you detected and I "removed" for now - is able to use the tools properly again.

I'm giving you my env parameters (launch.json), maybe you can try with that and see what happens? Some of them will be ignored as they are not used without my new code, but that shouldn't be an issue I assume:

  "env": {
    "LYNKR_CLEAR_SQLITE_CONTEXT": "true"
    ,"LOG_LEVEL": "debug"
    ,"LOG_FILE": "./logs/lynkr.log"
    ,"NODE_ENV": "development"
    ,"PORT": "8081"
    ,"FALLBACK_ENABLED": "false"
    ,"MODEL_PROVIDER": "openrouter"
    ,"OPENROUTER_TRANSFORMS": "middle-out"
    ,"OPENROUTER_MODEL": "minimax/minimax-m2"
    ,"MODEL_DEFAULT": "minimax/minimax-m2"
    ,"OPENROUTER_API_KEY": "sk-or..."
    ,"TOPIC_DETECTION_MODEL": "skip"
    ,"POLICY_MAX_STEPS": "10000"
    ,"POLICY_MAX_DURATION_MS": "500000"
    ,"OLLAMA_KEEP_ALIVE": "-1"
    ,"OLLAMA_MAX_HISTORY_MESSAGES": "0"
    ,"OLLAMA_MODEL_POLL_INTERVAL_MS": "5000"
    ,"OLLAMA_MODEL_CHECK_TIMEOUT_MS": "3000"
    ,"OLLAMA_MAX_TOOLS_FOR_ROUTING": ""
    ,"OLLAMA_MODEL_LOAD_TIMEOUT_MS": "60000"
    ,"OLLAMA_STRIP_CONTEXT_FILES": "true"
    ,"OLLAMA_TIMEOUT_MS": "120000"
    ,"MEMORY_ENABLED": "false"
    ,"LLM_AUDIT_ENABLED": "true"
    ,"LLM_AUDIT_LOG_FILE": "./logs/llm-audit.log"
    ,"LLM_AUDIT_APP_LOG_LEVEL": "info"
    ,"LLM_AUDIT_MAX_USER_LENGTH": "0"
    ,"LLM_AUDIT_MAX_SYSTEM_LENGTH": "0"
    ,"LLM_AUDIT_MAX_RESPONSE_LENGTH": "0"
    ,"LLM_AUDIT_MAX_CONTENT_LENGTH": "100000000"
    ,"AGENTS_ENABLED": "true"
    ,"POLICY_MAX_TOOL_CALLS": "1000"
    ,"TOOL_EXECUTION_MODE": "server"
    ,"POLICY_MAX_TOOL_CALLS_PER_REQUEST": "1000"
    ,"PROGRESS_ENABLED": "true"
    ,"PROGRESS_PORT": "8765"
  }

MichaelAnders mentioned this pull request Feb 22, 2026

Apply remaining improvements from PR #39 #49

Closed

MichaelAnders mentioned this pull request Feb 22, 2026

Ollama: long running requests without informing user #51

Closed

veerareddyvishal144 merged commit e260fb2 into Fast-Editor:feature/model-router Feb 23, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Enhance tool calling, classification, and progress reporting#55

feat: Enhance tool calling, classification, and progress reporting#55
veerareddyvishal144 merged 1 commit intoFast-Editor:feature/model-routerfrom
MichaelAnders:feature/model-router

MichaelAnders commented Feb 22, 2026

Uh oh!

MichaelAnders commented Feb 22, 2026

Uh oh!

veerareddyvishal144 commented Feb 23, 2026

Uh oh!

Uh oh!

MichaelAnders commented Feb 23, 2026

Uh oh!

veerareddyvishal144 commented Feb 23, 2026

Uh oh!

MichaelAnders commented Feb 23, 2026 •

edited

Loading

Uh oh!

MichaelAnders commented Feb 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

MichaelAnders commented Feb 22, 2026

Key Changes

Test Infrastructure

New Major Features

Tool Calling & Parsing (vLLM-Inspired)

Dual-Provider Tool Execution

Tool Classification

Progress Reporting (Real-Time Monitoring)

Configuration Enhancements

Files Changed

Test Infrastructure

Core Parser System (vLLM-Based)

Tool Execution & Classification

Provider & Routing

Progress & Observability

Configuration & Documentation

Tests (14 new files, 490/490 passing)

Attribution

Uh oh!

MichaelAnders commented Feb 22, 2026

Uh oh!

veerareddyvishal144 commented Feb 23, 2026

Uh oh!

Uh oh!

MichaelAnders commented Feb 23, 2026

Uh oh!

veerareddyvishal144 commented Feb 23, 2026

Uh oh!

MichaelAnders commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MichaelAnders commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MichaelAnders commented Feb 23, 2026 •

edited

Loading

MichaelAnders commented Feb 24, 2026 •

edited

Loading