Skip to main content

Methodology

How BuilderBox accurately tracks your coding time and distinguishes between human and AI-generated code.

Overview

BuilderBox uses a client-side accumulation approach for precise time tracking. Instead of inferring time from heartbeat gaps on the server, the VS Code extension tracks time locally and sends accumulated values with each heartbeat.

Privacy First

No code content is ever stored. Only file paths, timestamps, and metrics.

High Accuracy

~98% time tracking accuracy with client-side accumulation.

Time Calculation

Heartbeat System

BuilderBox sends "heartbeats" every 2 minutes while you're active in VS Code/Cursor. Each heartbeat contains:

  • Current file path and language
  • Accumulated time by activity type (coding, reviewing, terminal, prompting)
  • AI/human character counts
  • Change metrics (chars added/deleted, lines changed)

Client-Side Accumulation

The extension tracks time precisely using a state machine:

State Machine Transitions:
├── Document edit → coding_ms accumulation
├── Editor focus (no edits for 5s) → reviewing_ms
├── Terminal focus → terminal_ms
├── AI panel focus → prompting_ms
└── Window blur → idle_ms

Benefits over gap-based inference:

  • No time loss when switching between activities
  • Accurate tracking during long sessions
  • Precise breakdown by activity type

Session Detection

A coding session ends after 5 minutes of inactivity. The server uses a 300-second session timeout for backward compatibility with older extension versions that don't send accumulated time.

Activity Types

Coding

Active document edits. Triggered by any text change in a file. This is the primary activity type that most developers spend time in.

Reviewing

Viewing files without making edits. Includes scrolling, reading, and navigating code. Detected when you have a file open but haven't edited for 5+ seconds.

Terminal

Time spent in the integrated terminal. Tracked via terminal focus events and output detection. Includes running commands, reading logs, etc.

Prompting

Time spent in AI chat panels (Copilot Chat, Cursor Composer, Cline, etc.). Detected by monitoring when no text editor is active but the window is focused. Prompt submissions are tracked to enable accurate AI classification of subsequent edits.

AI Detection Methods

BuilderBox uses multiple detection techniques layered by confidence level. Higher-tier methods are tried first, with fallback to heuristics when API access isn't available.

Tier 1: Cursor Jump + Keystroke Timing (~92% confidence)

The most reliable detection for inline completions. When you accept a completion (via Tab, Enter, or click), the cursor jumps forward significantly:

  • Large cursor jumps (20+ chars) on the same line are detected
  • Must occur without recent keystrokes (150ms+ idle)
  • Multi-line forward jumps also captured for block completions

Tier 2: Character-Level Timing (~85-90% confidence)

AI completions insert all characters in the same event loop tick (0ms between chars). Human typing has natural variance (50-300ms between keystrokes).

  • 20+ characters inserted after 150ms of no keystrokes = likely AI
  • Confidence scales with insertion size
  • Catches completions accepted via Enter or other methods

Tier 3: Multi-Cursor Detection (~90% confidence)

AI agents (Cline, Cursor Composer) often make simultaneous edits at multiple non-adjacent positions. Humans rarely do this.

  • Multiple content changes in single event
  • Changes at positions > 5 lines apart = agent activity
  • Strong signal for detecting agentic AI tools

Tier 3b: Prompt Window Detection (~85-90% confidence)

When using AI chat assistants (Cursor Composer, Copilot Chat, Cline), edits that occur shortly after sending a prompt are classified as AI-generated.

  • Tracks when user sends prompts to AI chat panels
  • Edits within 60 seconds of prompt submission = likely AI response
  • Dynamic window: larger edits (500+ chars) extend window to 120 seconds
  • Confidence scales with time since prompt and edit size
  • Overrides keystroke timing check (prompts count as keystrokes)

Tier 4: Heuristic Fallback (~70-80% confidence)

When direct detection isn't possible:

  • Large insertions (50+ chars) without recent keystrokes
  • Extension presence detection (is Copilot/Cursor installed?)
  • Code pattern analysis (AI often generates complete functions)

Paste Exclusion

Paste operations are detected by comparing inserted text to clipboard content and are explicitly excluded from AI classification. Pasting code is a human action.

Supported AI Tools

ToolDetection MethodConfidence
GitHub CopilotTab key + timing + extension API~92%
Cursor TabTab key + timing~92%
Cursor ComposerPrompt window + multi-cursor + large edits~90%
ClineMulti-cursor + extension detection~90%
ContinueExtension detection + heuristics~80%
TabnineExtension detection + heuristics~80%
CodeiumExtension detection + heuristics~80%
OthersHeuristic fallback~70%

Data Flow

VS Code Extension
    ├── DocumentTracker (file changes)
    ├── InputMonitor (keystrokes, Tab key, paste)
    ├── SessionAccumulator (time by activity)
    ├── AIDetector (AI classification)
    └── HeartbeatBuilder → HeartbeatQueue
                              │
                              ▼
                    BuilderBox API (/api/heartbeats)
                              │
                              ▼
                    PostgreSQL (coding_heartbeats)
                              │
                              ▼
                    Aggregation (summaries, hourly stats)

What's Stored

  • File paths - Which files you worked on
  • Timestamps - When activity occurred
  • Metrics - Characters added/deleted, lines changed
  • AI attribution - Whether changes were AI-generated
  • Activity type - Coding, reviewing, terminal, prompting

What's NOT Stored

  • Actual code content
  • AI prompts or responses
  • Personal data beyond file paths
  • Screen recordings or screenshots

Expected Accuracy

CategoryAccuracyMethod
Total coding time~98%Client-side accumulation
Reviewing time~98%Client-side accumulation
Terminal time~95%Output events + 60s threshold
Prompting time~90%AI panel detection
AI from inline completions~92%Cursor jump + keystroke timing
AI from agents~90%Multi-cursor detection
Note: These are estimated accuracies based on our testing. Actual accuracy may vary depending on your workflow, AI tools used, and coding patterns.