Skip to main content

Methodology

How BuilderBox accurately tracks your coding time and distinguishes between human and AI-generated code.

Overview

BuilderBox uses a client-side accumulation approach for precise time tracking. Instead of inferring time from heartbeat gaps on the server, the VS Code extension tracks time locally and sends accumulated values with each heartbeat.

Privacy First

No code content is ever stored. Only file paths, timestamps, and metrics.

High Accuracy

Precise time tracking with client-side accumulation.

Time Calculation

Heartbeat System

BuilderBox sends "heartbeats" every 2 minutes while you're active in VS Code/Cursor. Each heartbeat contains:

  • Current file path and language
  • Accumulated time by activity type (coding, reviewing, terminal, prompting)
  • AI/human character counts
  • Change metrics (chars added/deleted, lines changed)

Client-Side Accumulation

The extension tracks time precisely using a state machine:

State Machine Transitions:
├── Document edit → coding_ms accumulation
├── Editor focus (no edits for 5s) → reviewing_ms
├── Terminal focus → terminal_ms
├── AI panel focus → prompting_ms
└── Window blur → idle_ms

Benefits over gap-based inference:

  • No time loss when switching between activities
  • Accurate tracking during long sessions
  • Precise breakdown by activity type

Session Detection

A coding session ends after 5 minutes of inactivity. The server uses a 300-second session timeout for backward compatibility with older extension versions that don't send accumulated time.

Activity Types

Coding

Active document edits. Triggered by any text change in a file. This is the primary activity type that most developers spend time in.

Reviewing

Viewing files without making edits. Includes scrolling, reading, and navigating code. Detected when you have a file open but haven't edited for 5+ seconds.

Terminal

Time spent in the integrated terminal. Tracked via terminal focus events and output detection. Includes running commands, reading logs, etc.

Prompting

Time spent in AI chat panels (Copilot Chat, Cursor Composer, Cline, etc.). Detected by monitoring when no text editor is active but the window is focused. Prompt submissions are tracked to enable accurate AI classification of subsequent edits.

AI Detection Methods

BuilderBox uses multiple detection techniques layered by confidence level. Higher-tier methods are tried first, with fallback to heuristics when API access isn't available.

Tier 1: Cursor Jump + Keystroke Timing (Highest confidence)

The most reliable detection for inline completions. When you accept a completion (via Tab, Enter, or click), the cursor jumps forward significantly:

  • Large cursor jumps (20+ chars) on the same line are detected
  • Must occur without recent keystrokes (150ms+ idle)
  • Multi-line forward jumps also captured for block completions

Tier 2: Character-Level Timing (High confidence)

AI completions insert all characters in the same event loop tick (0ms between chars). Human typing has natural variance (50-300ms between keystrokes).

  • 20+ characters inserted after 150ms of no keystrokes = likely AI
  • Confidence scales with insertion size
  • Catches completions accepted via Enter or other methods

Tier 3: Multi-Cursor Detection (High confidence)

AI agents (Cline, Cursor Composer) often make simultaneous edits at multiple non-adjacent positions. Humans rarely do this.

  • Multiple content changes in single event
  • Changes at positions > 5 lines apart = agent activity
  • Strong signal for detecting agentic AI tools

Tier 3b: Prompt Window Detection (High confidence)

When using AI chat assistants (Cursor Composer, Copilot Chat, Cline), edits that occur shortly after sending a prompt are classified as AI-generated.

  • Tracks when user sends prompts to AI chat panels
  • Edits within 60 seconds of prompt submission = likely AI response
  • Dynamic window: larger edits (500+ chars) extend window to 120 seconds
  • Confidence scales with time since prompt and edit size
  • Overrides keystroke timing check (prompts count as keystrokes)

Tier 4: Heuristic Fallback (Moderate confidence)

When direct detection isn't possible:

  • Large insertions (50+ chars) without recent keystrokes
  • Extension presence detection (is Copilot/Cursor installed?)
  • Code pattern analysis (AI often generates complete functions)

Paste Exclusion

Paste operations are detected by comparing inserted text to clipboard content and are explicitly excluded from AI classification. Pasting code is a human action.

Exception for Cursor: In Cursor, large edits (50+ chars) with no recent keystrokes (500ms+) skip paste classification, as these are likely AI agent edits that happen to match clipboard content.

Supported AI Tools

ToolDetection MethodConfidence
GitHub CopilotTab key + timing + extension APIHighest
Cursor TabTab key + timingHighest
Cursor Composer/AgentPrompt window + multi-cursor + large edits + keystroke timingHigh
ClineMulti-cursor + extension detectionHigh
ContinueExtension detection + heuristicsModerate
TabnineExtension detection + heuristicsModerate
CodeiumExtension detection + heuristicsModerate
OthersHeuristic fallbackLower

Data Flow

VS Code Extension
    ├── DocumentTracker (file changes)
    ├── InputMonitor (keystrokes, Tab key, paste)
    ├── SessionAccumulator (time by activity)
    ├── AIDetector (AI classification)
    └── HeartbeatBuilder → HeartbeatQueue
                              │
                              ▼
                    BuilderBox API (/api/heartbeats)
                              │
                              ▼
                    PostgreSQL (coding_heartbeats)
                              │
                              ▼
                    Aggregation (summaries, hourly stats)

What's Stored

  • File paths - Which files you worked on
  • Timestamps - When activity occurred
  • Metrics - Characters added/deleted, lines changed
  • AI attribution - Whether changes were AI-generated
  • Activity type - Coding, reviewing, terminal, prompting

What's NOT Stored

  • Actual code content
  • AI prompts or responses
  • Personal data beyond file paths
  • Screen recordings or screenshots

Relative Accuracy

CategoryAccuracyMethod
Total coding timeVery HighClient-side accumulation
Reviewing timeVery HighClient-side accumulation
Terminal timeHighOutput events + 60s threshold
Prompting timeHighAI panel detection
AI from inline completionsHighestCursor jump + keystroke timing
AI from agentsHighMulti-cursor + prompt window detection
Note: Actual accuracy varies depending on your workflow, AI tools used, and coding patterns. Detection is most reliable for inline completions (Tab acceptance) and least reliable for tools without native API integration.