Deep Repo Navigation
Find the correct files in a large project and add a setting without creating a parallel system.
- Fails for duplicate systems.
- Fails for invented paths.
- Fails for editing the wrong module.
WYBench task packs are built to expose long-term agent reliability, not just one-shot patch ability.
Find the correct files in a large project and add a setting without creating a parallel system.
Read current rules, old notes, deprecated plans, and active decisions, then implement from current truth.
Run tests, inspect failures, fix the issue, rerun verification, and explain what changed.
Follow a dense requirement list without forgetting constraints in the middle or end.
Plan, implement, fix, and audit across sessions while preserving the original project rules.
Add a public benchmark page matching existing Lee Wyatt Corp chrome, login state, and layout patterns.
Each official result must include the initial prompt, model output, tool logs, final files changed, test results, human review notes, pass/fail reason, and verification status.