Skip to content

feat: Enhance tool calling, classification, and progress reporting#55

Merged
veerareddyvishal144 merged 1 commit intoFast-Editor:feature/model-routerfrom
MichaelAnders:feature/model-router
Feb 23, 2026
Merged

feat: Enhance tool calling, classification, and progress reporting#55
veerareddyvishal144 merged 1 commit intoFast-Editor:feature/model-routerfrom
MichaelAnders:feature/model-router

Conversation

@MichaelAnders
Copy link
Copy Markdown
Contributor

CRITICAL: All test scripts now require NODE_ENV=test environment variable. This ensures tests run in test mode with proper isolation from production code paths (e.g., disables live API calls, mocking, test fixtures). See package.json test:* scripts.

Apply comprehensive improvements to Lynkr's tool execution pipeline, including per-model tool parsers (based on vLLM), LLM-based classification, real-time progress monitoring, and advanced agent routing with dual-provider support for cloud-based tool execution.

Key Changes

Test Infrastructure

  • NODE_ENV=test: Added to ALL test scripts (test:unit, test:memory, etc.)
    • Ensures isolated test environment without production side effects
    • Enables test fixtures and mocking frameworks
    • Prevents accidental API calls during testing
    • IMPORTANT: This is a breaking change if tests are run without this var

New Major Features

Tool Calling & Parsing (vLLM-Inspired)

  • Per-model tool parsers for GLM-4.7, Qwen3, and generic models
    • Implementation follows vLLM's ToolParser hierarchy (Apache 2.0 license)
    • GLM-4.7 parser: Handles native XML format + fallback patterns
    • Qwen3 parser: Markdown extraction with robust error handling
    • Generic parser: Extensible base for any model format
  • Ollama fallback handling for malformed responses
  • Tool call deduplication and cleaning

Dual-Provider Tool Execution

  • TOOL_EXECUTION_PROVIDER: Route tool calls to specialized providers
    • Enables using cheap/fast/local models for chat while using reliable models (Claude Sonnet) for tool calling
    • Reduces token usage and improves tool accuracy
  • TOOL_EXECUTION_COMPARE_MODE: Compare tool calls from both providers
  • OLLAMA_CLOUD_ENDPOINT: Support for cloud-based Ollama models
    • Enables "cloud-only" setups without local Ollama
    • Automatic routing: cloud models use cloud endpoint
    • Hybrid support: mix local and cloud models in same session

Tool Classification

  • LLM-based tool needs classification (whitelist + LLM fallback)
  • Per-model classification accuracy with pattern matching
  • Tool execution provider routing based on classification
  • Workspace access permission system for external file operations

Progress Reporting (Real-Time Monitoring)

  • WebSocket server (port 8765) broadcasting execution events
  • Progress events: agent loop, model invocation, tool execution
  • Built-in Python listener (tools/progress-listener.py)
    • Color-coded output with timestamps
    • Agent hierarchy tracking (parent/child relationships)
    • Token and duration metrics
    • Remote monitoring support
  • Event tracking for debugging and observability

Configuration Enhancements

New environment variables (see .env.example for defaults):

  • OLLAMA_CLOUD_ENDPOINT: Cloud Ollama instance URL
  • OLLAMA_API_KEY: Cloud Ollama API authentication
  • TOOL_EXECUTION_PROVIDER: Provider for tool calling decisions
  • TOOL_EXECUTION_MODEL: Model override for tool execution
  • TOOL_EXECUTION_COMPARE_MODE: Enable provider comparison
  • POLICY_MAX_DURATION_MS: Single agent loop turn timeout
  • POLICY_TOOL_LOOP_THRESHOLD: Max tool results before termination
  • POLICY_MAX_TOOL_CALLS_PER_REQUEST: Parallel tool call limit
  • TOOL_NEEDS_CLASSIFICATION_*: Classification whitelist and LLM config

Files Changed

61 files modified (9,061 insertions, 432 deletions):

Test Infrastructure

  • package.json: NODE_ENV=test on all test:* scripts

Core Parser System (vLLM-Based)

  • src/parsers/base-tool-parser.js: Base class hierarchy
  • src/parsers/glm47-tool-parser.js: GLM-4.7 tool parsing
  • src/parsers/generic-tool-parser.js: Extensible generic parser
  • src/parsers/index.js: Parser registry and selection

Tool Execution & Classification

  • src/tools/tool-call-cleaner.js: Response cleanup and deduplication
  • src/tools/tool-classification-*.js: Classification system
  • src/agents/tool-agent-mapper.js: Tool-agent relationship mapping

Provider & Routing

  • src/clients/ollama-utils.js: Dual endpoint support (local + cloud)
  • src/api/router.js: Provider routing and conversion
  • src/providers/context-window.js: NEW - Context detection

Progress & Observability

  • src/progress/server.js: NEW - WebSocket server
  • src/progress/emitter.js: NEW - Event broadcasting
  • src/progress/client.js: NEW - Client monitoring
  • tools/progress-listener.py: NEW - Python listener tool

Configuration & Documentation

  • .env.example: OLLAMA_CLOUD_ENDPOINT, TOOL_EXECUTION_, POLICY_
  • config/tool-whitelist-*.json: Classification patterns

Tests (14 new files, 490/490 passing)

  • Tool parser tests (GLM, Qwen3, generic)
  • Tool classification and accuracy tests
  • Dual endpoint and cloud Ollama tests
  • Tool execution provider tests with comparison mode
  • Subagent auto-spawning tests
  • Progress reporting integration tests

Attribution

  • Per-model tool parsers: Based on vLLM's tool calling implementation (Apache License 2.0, https://github.com/vllm-project/vllm)
  • Progress reporting: Real-time WebSocket event system
  • Agent routing: Dual-provider architecture for cost optimization

on Ollama

CRITICAL: All test scripts now require NODE_ENV=test environment variable.
This ensures tests run in test mode with proper isolation from production
code paths (e.g., disables live API calls, mocking, test fixtures).
See package.json test:* scripts.

Apply comprehensive improvements to Lynkr's tool execution pipeline,
including per-model tool parsers (based on vLLM), LLM-based classification,
real-time progress monitoring, and advanced agent routing with dual-provider
support for cloud-based tool execution.

## Key Changes

### Test Infrastructure
- NODE_ENV=test: Added to ALL test scripts (test:unit, test:memory, etc.)
  * Ensures isolated test environment without production side effects
  * Enables test fixtures and mocking frameworks
  * Prevents accidental API calls during testing
  * IMPORTANT: This is a breaking change if tests are run without this var

## New Major Features

### Tool Calling & Parsing (vLLM-Inspired)
- Per-model tool parsers for GLM-4.7, Qwen3, and generic models
  * Implementation follows vLLM's ToolParser hierarchy (Apache 2.0 license)
  * GLM-4.7 parser: Handles native XML format + fallback patterns
  * Qwen3 parser: Markdown extraction with robust error handling
  * Generic parser: Extensible base for any model format
- Ollama fallback handling for malformed responses
- Tool call deduplication and cleaning

### Dual-Provider Tool Execution
- TOOL_EXECUTION_PROVIDER: Route tool calls to specialized providers
  * Enables using cheap/fast/local models for chat while using
    reliable models (Claude Sonnet) for tool calling
  * Reduces token usage and improves tool accuracy
- TOOL_EXECUTION_COMPARE_MODE: Compare tool calls from both providers
- OLLAMA_CLOUD_ENDPOINT: Support for cloud-based Ollama models
  * Enables "cloud-only" setups without local Ollama
  * Automatic routing: cloud models use cloud endpoint
  * Hybrid support: mix local and cloud models in same session

### Tool Classification
- LLM-based tool needs classification (whitelist + LLM fallback)
- Per-model classification accuracy with pattern matching
- Tool execution provider routing based on classification
- Workspace access permission system for external file operations

### Progress Reporting (Real-Time Monitoring)
- WebSocket server (port 8765) broadcasting execution events
- Progress events: agent loop, model invocation, tool execution
- Built-in Python listener (tools/progress-listener.py)
  * Color-coded output with timestamps
  * Agent hierarchy tracking (parent/child relationships)
  * Token and duration metrics
  * Remote monitoring support
- Event tracking for debugging and observability

## Configuration Enhancements

New environment variables (see .env.example for defaults):
- OLLAMA_CLOUD_ENDPOINT: Cloud Ollama instance URL
- OLLAMA_API_KEY: Cloud Ollama API authentication
- TOOL_EXECUTION_PROVIDER: Provider for tool calling decisions
- TOOL_EXECUTION_MODEL: Model override for tool execution
- TOOL_EXECUTION_COMPARE_MODE: Enable provider comparison
- POLICY_MAX_DURATION_MS: Single agent loop turn timeout
- POLICY_TOOL_LOOP_THRESHOLD: Max tool results before termination
- POLICY_MAX_TOOL_CALLS_PER_REQUEST: Parallel tool call limit
- TOOL_NEEDS_CLASSIFICATION_*: Classification whitelist and LLM config

## Files Changed

61 files modified (9,061 insertions, 432 deletions):

### Test Infrastructure
- package.json: NODE_ENV=test on all test:* scripts

### Core Parser System (vLLM-Based)
- src/parsers/base-tool-parser.js: Base class hierarchy
- src/parsers/glm47-tool-parser.js: GLM-4.7 tool parsing
- src/parsers/generic-tool-parser.js: Extensible generic parser
- src/parsers/index.js: Parser registry and selection

### Tool Execution & Classification
- src/tools/tool-call-cleaner.js: Response cleanup and deduplication
- src/tools/tool-classification-*.js: Classification system
- src/agents/tool-agent-mapper.js: Tool-agent relationship mapping

### Provider & Routing
- src/clients/ollama-utils.js: Dual endpoint support (local + cloud)
- src/api/router.js: Provider routing and conversion
- src/providers/context-window.js: NEW - Context detection

### Progress & Observability
- src/progress/server.js: NEW - WebSocket server
- src/progress/emitter.js: NEW - Event broadcasting
- src/progress/client.js: NEW - Client monitoring
- tools/progress-listener.py: NEW - Python listener tool

### Configuration & Documentation
- .env.example: OLLAMA_CLOUD_ENDPOINT, TOOL_EXECUTION_*, POLICY_*
- config/tool-whitelist-*.json: Classification patterns

### Tests (14 new files, 490/490 passing)
- Tool parser tests (GLM, Qwen3, generic)
- Tool classification and accuracy tests
- Dual endpoint and cloud Ollama tests
- Tool execution provider tests with comparison mode
- Subagent auto-spawning tests
- Progress reporting integration tests

## Attribution

- **Per-model tool parsers**: Based on vLLM's tool calling implementation
  (Apache License 2.0, https://github.com/vllm-project/vllm)
- **Progress reporting**: Real-time WebSocket event system
- **Agent routing**: Dual-provider architecture for cost optimization

Co-Authored-By: Claude Haiku 4.5, Sonnet 4.6, Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: GLM-4.7-cloud <noreply@z.ai> on Ollama
@MichaelAnders
Copy link
Copy Markdown
Contributor Author

Two things:

  1. vLLM tool calling implementations are added - please adjust if needed the code to reflect it properly (I mentioned vLLM in "Attribution)
  2. With these changes GLM-4.7 can be used for some code analysis&corrections - sometimes it gets stuck ("Let me do XYZ..." responses) which can be overcome with "do XYZ" - that is a WIP I'll look into (several ideas), but 1st this has to be merged. Then other models will be enabled using the vLLM parser implementations which are really good!

@veerareddyvishal144
Copy link
Copy Markdown

Thanks @MichaelAnders for your contribution I am merging it.

@veerareddyvishal144 veerareddyvishal144 merged commit e260fb2 into Fast-Editor:feature/model-router Feb 23, 2026
1 check passed
@MichaelAnders
Copy link
Copy Markdown
Contributor Author

Ok, so once you've merged e260fb2 into main branch I can contribute more fixes + enhancements.

@veerareddyvishal144
Copy link
Copy Markdown

Screenshot 2026-02-23 at 6 52 17 PM This is the issue I am running into. As of now tool calling in ollama works in the code on the main branch. Can you please fix it.

@MichaelAnders
Copy link
Copy Markdown
Contributor Author

MichaelAnders commented Feb 23, 2026

I will restore the previous behavior and use the new tool parsers (vllm based) only if they have been implemented.

So I went back to main-branch (old behaviour which you say worked for you with the prompts you supplied).

To reproduce: Which MODEL_PROVIDER are you using (I have openrouter available), what is your MODEL_DEFAULT etc.?

@MichaelAnders
Copy link
Copy Markdown
Contributor Author

MichaelAnders commented Feb 24, 2026

Ok, this is getting weird...

  • I cloned both main-branch and feature/model-router
  • I was able to reproduce the failing tool calls in feature/model-router
  • I changed feature/model-router and "if 1 == 0" 'd some of my code sections to see what gives

What I noticed when repeating the same tests multiple times:

  1. I got horrible results even after commenting my code, but I got a response.
  2. main-branch never worked.
  3. The new "progress listener" showed huge context from previous prompts.

As I've experienced bad results very often due to the chat history being added automatically, I decided to get rid of this as a potential "pollution/noise" which we can never sync on - the old noise will always confuse the LLMs. To prevent that I added new code in src-server.js - feel free to add it to main branch, I think this will help everyone who report issues.

In my opinion we should also have an option to dump ALL used parameters at runtime to reproduce issues 1:1. For security reasons we need to be careful with API keys and never log them at all ;) So if someone would add that as well, cool! I have enough to work on with my feature/model-router for now.

`}

// Clear SQLite context databases BEFORE initializing
// Controlled by LYNKR_CLEAR_SQLITE_CONTEXT environment variable
if (process.env.LYNKR_CLEAR_SQLITE_CONTEXT === 'true') {
const dataDir = path.join(__dirname, '..', 'data');
const sqliteDatabases = ['sessions.db', 'lynkr.db', 'budgets.db', 'prompt-cache.db'];

try {
let deletedCount = 0;
for (const dbFile of sqliteDatabases) {
const dbPath = path.join(dataDir, dbFile);
if (fs.existsSync(dbPath)) {
fs.unlinkSync(dbPath);
deletedCount++;
}
}
if (deletedCount > 0) {
console.log([STARTUP] Cleared ${deletedCount} SQLite database file(s) from ${dataDir});
}
} catch (err) {
console.error([STARTUP] Failed to clear SQLite context: ${err.message});
}
}

const loggingMiddleware = require("./api/middleware/logging");`

Then I tried it again. The result was... surprising?
main_branch

revised_feature_model_router

I find this interesting because on main branch, I am unable to reproduce the issue you ran into and instead get no response at all?
I do see this though in my logs - not a new issue, not sure if I fixed that in feature/model-router or before already (which then obviously didn't work):

Failed to parse tool arguments
    env: "development"
    err: {
      "type": "SyntaxError",
      "message": "Unexpected non-whitespace character after JSON at position 16 (line 1 column 17)",
      "stack":
          SyntaxError: Unexpected non-whitespace character after JSON at position 16 (line 1 column 17)
              at JSON.parse (<anonymous>)
              at parseArguments (/home/user/readd_old_tools/main_branch/src/tools/index.js:136:17)
              at normaliseToolCall (/home/user/readd_old_tools/main_branch/src/tools/index.js:149:16)

feature/model-router - along with at least this one regression you detected and I "removed" for now - is able to use the tools properly again.

I'm giving you my env parameters (launch.json), maybe you can try with that and see what happens? Some of them will be ignored as they are not used without my new code, but that shouldn't be an issue I assume:

  "env": {
    "LYNKR_CLEAR_SQLITE_CONTEXT": "true"
    ,"LOG_LEVEL": "debug"
    ,"LOG_FILE": "./logs/lynkr.log"
    ,"NODE_ENV": "development"
    ,"PORT": "8081"
    ,"FALLBACK_ENABLED": "false"
    ,"MODEL_PROVIDER": "openrouter"
    ,"OPENROUTER_TRANSFORMS": "middle-out"
    ,"OPENROUTER_MODEL": "minimax/minimax-m2"
    ,"MODEL_DEFAULT": "minimax/minimax-m2"
    ,"OPENROUTER_API_KEY": "sk-or..."
    ,"TOPIC_DETECTION_MODEL": "skip"
    ,"POLICY_MAX_STEPS": "10000"
    ,"POLICY_MAX_DURATION_MS": "500000"
    ,"OLLAMA_KEEP_ALIVE": "-1"
    ,"OLLAMA_MAX_HISTORY_MESSAGES": "0"
    ,"OLLAMA_MODEL_POLL_INTERVAL_MS": "5000"
    ,"OLLAMA_MODEL_CHECK_TIMEOUT_MS": "3000"
    ,"OLLAMA_MAX_TOOLS_FOR_ROUTING": ""
    ,"OLLAMA_MODEL_LOAD_TIMEOUT_MS": "60000"
    ,"OLLAMA_STRIP_CONTEXT_FILES": "true"
    ,"OLLAMA_TIMEOUT_MS": "120000"
    ,"MEMORY_ENABLED": "false"
    ,"LLM_AUDIT_ENABLED": "true"
    ,"LLM_AUDIT_LOG_FILE": "./logs/llm-audit.log"
    ,"LLM_AUDIT_APP_LOG_LEVEL": "info"
    ,"LLM_AUDIT_MAX_USER_LENGTH": "0"
    ,"LLM_AUDIT_MAX_SYSTEM_LENGTH": "0"
    ,"LLM_AUDIT_MAX_RESPONSE_LENGTH": "0"
    ,"LLM_AUDIT_MAX_CONTENT_LENGTH": "100000000"
    ,"AGENTS_ENABLED": "true"
    ,"POLICY_MAX_TOOL_CALLS": "1000"
    ,"TOOL_EXECUTION_MODE": "server"
    ,"POLICY_MAX_TOOL_CALLS_PER_REQUEST": "1000"
    ,"PROGRESS_ENABLED": "true"
    ,"PROGRESS_PORT": "8765"
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants