
THE CODEBASE INTELLIGENCE LAYER FOR THE AI ERA · OPEN SOURCE · HOSTED
Context your AI agent can use.
Signals your team can trust.
Index any repo once. Your AI agents get architecture, ownership, and decisions they can actually query. Your team gets the health, risk, and provenance to trust what those agents ship. Open source and self-hostable.
2,391 vs 64,039 on the same task. ~27x fewer. Answer quality at parity.
vs a leading commercial tool under the same review budget, same 2,770 files.
The only code-health score validated against real defects. Up to 0.90 per repo.
Measured on real repositories. Every number is reproducible on your own codebase.
One index. Every team it serves.
repowise indexes your repo once and serves the agents writing your code and the humans accountable for it. Find the path built for you.
developers
Give Claude Code, Cursor, and any MCP client a queryable model of your repo.
Exploreteam leads
Flag the risky PRs, the hotspots, and the hidden coupling, on every pull request.
Exploreengineering leaders
See how much of your code AI wrote, whether it is healthy, and who owns it.
Exploresecurity
CVE triage that knows whether you actually call the vulnerable code.
ExploreBuilt for the agents writing your code,
and the humans accountable for it.
repowise indexes your repo once. The same intelligence makes your agents smarter and your code trustworthy.
Context they can use
Signals you can trust
One engine, three interfaces
Install once. Choose the interface that fits your workflow — or use all three. They share the same data, the same intelligence, the same stores.

CLI
For the solo developer

MCP Server
For AI-native workflows

Web UI
For the whole team
Most tools answer one question.
repowise answers five.
Graph structure, code health, git history, generated documentation, and architectural decisions — five layers that compound into genuine understanding.
Every dependency, ranked and traced
- Tree-sitter ASTs across 10+ languages → directed dependency graph
- PageRank and betweenness centrality surface critical symbols
- Edge types: imports, calls, inherits, implements, co-changes
- Scales to 30K+ nodes with automatic SQLite-backed graph
It knows which file breaks next — before it does
- State-of-the-art accuracy: ~73% accurate at calling which files are headed for a bug — and on the same code, the same real defects, it matches or beats the best commercial tools and published academic models.
- One 1–10 score from 25 deterministic signals: tangled complexity, hidden coupling, missing tests, runaway churn, fragile ownership. No LLM, no cloud — under 30 seconds on a 3,000-file repo.
- The weights are learned from a real defect corpus, not hand-tuned — so it out-predicts “what changed recently” and “what broke before” by 10+ points, and matches published academic defect models on benchmarks it never saw.
- Ranks what to fix first by impact-for-effort, and alerts you the moment a file's health starts sliding.
proven on 21 real projects across 9 languages
History that writes the documentation
- Hotspot detection — top 25% churn + complexity files flagged
- Co-change partners: files that change together without imports
- Ownership from git blame — primary owner + top 3 contributors
- Significant commits filtered into generation prompts
Wiki pages that stay fresh
- 9-level hierarchical generation: symbols → files → modules → repo
- Confidence scoring with git-informed decay — stale pages auto-regenerate
- RAG context via LanceDB or pgvector — each page knows its imports
- Resumable, crash-safe, idempotent — checkpoint after every page
The why behind your architecture
- 4 capture sources: inline markers, git archaeology, README mining, CLI
- Staleness tracking — decisions age when governed files get commits
- get_why() searches decisions before you change anything
- Health dashboard: stale decisions, ungoverned hotspots, proposed reviews
AI writes half your code.
Can you trust it?
repowise attributes commits to the agents that wrote them, then shows which of that code is a low-health hotspot owned by a single person. From your git history alone. No IDE plugins, no developer surveillance. It is the one view no other tool puts on one screen.
Code health intelligence on every PR
The Repowise PR Bot is a GitHub App that posts one deterministic comment per pull request — hotspots, hidden coupling, declining health, dead code. Zero LLM calls. Green PRs stay silent. Free for public/OSS repos; private repos require the Pro plan.
- One comment per PR. Edited in place on re-pushes.
- Silence rule. Stays quiet unless health degrades, a hotspot is touched, a co-change partner is missing, or dead code shifts.
- Zero LLM cost. Pure tree-sitter, NetworkX, the 12-biomarker scorer.
- Free forever for OSS. Private repos unlock with the Pro plan.
⚠️ Health: 7.0 → 6.8 (-0.2)
graph.py3.3 → 2.2▼ -1.1
untested hotspot, brain method, nested complexity
🔥 Hotspot touched
graph.py — 21 commits/90d, 13 dependents
primary owner: Raghav (62%)
🔗 Hidden coupling
graph.py co-changes with orchestrator.py (8×)
— not in this PR.
9 tools your AI agent already knows how to call
get_overview()— Architecture summary, module map, entry points, tech stack.get_answer()— One-call RAG Q&A. Retrieves over the wiki, gates on confidence, returns a cited 2–5 sentence answer.get_context()— Docs, ownership, history, decisions, freshness for files, modules, or symbols. Pass multiple targets in one call.get_symbol()— Raw source bytes for one indexed symbol with exact line bounds — cheaper and safer than Read + offset math.search_codebase()— Semantic search over the full wiki using LanceDB or pgvector. Natural language queries.get_risk()— Hotspot score, dependents, co-change partners, risk summary. Also returns top 5 global hotspots.get_why()— Three modes: natural language search over decisions, path-based lookup, or health dashboard.get_dead_code()— Unreachable files, unused exports, zombie packages — sorted by confidence and cleanup impact.get_health()— Per-file health scores from 25 deterministic biomarkers, the worst files, and ranked refactoring targets.
CLAUDE.md that writes itself
- Architecture overview from the real dependency graph
- Hotspot warnings with churn metrics and owners
- Key design decisions and architectural constraints
- Dead code summary with confidence scores
- Entry points, build commands, and tech stack
- Also generates cursor.md — same data, different format
The full picture, side by side
- Auto-generated docs, git intelligence, decision records, and MCP tools — one package
- Open-source (AGPL-3.0) and fully self-hostable
- 17/17 features vs 4-6/17 for any single competitor
| Feature | repowise | Google Code Wiki | DeepWiki | CodeScene | Sourcegraph |
|---|---|---|---|---|---|
| Self-hostable OSS | ✓ | — | — | — | — |
| Works with private repos | ✓ | — | ✓ | ✓ | ✓ |
| Auto-generated wiki (LLM) | ✓ | ✓ | ✓ | — | — |
| Git intelligence (hotspots / ownership / co-changes) | ✓ | — | — | ✓ | — |
| Dead code detection | ✓ | — | — | — | — |
| Architectural decision records | ✓ | — | — | — | — |
| MCP server for AI agents | ✓ | — | — | — | — |
| Semantic search | ✓ | ✓ | ✓ | — | ✓ |
| Doc freshness / confidence scoring | ✓ | — | — | — | — |
| CLAUDE.md auto-generation | ✓ | — | — | — | — |
| Codebase chat (agentic) | ✓ | ✓ | ✓ | — | — |
| Dependency graph visualization | ✓ | ✓ | ✓ | ✓ | ✓ |
| PR review bot (code-health comments) | ✓ | — | — | ✓ | — |
| Code-health score (per file) | ✓ | — | — | ✓ | — |
| AI code provenance (agent attribution) | ✓ | — | — | — | — |
| Provider choice (4 LLM providers) | ✓ | — | — | — | — |
| Privacy (code never leaves your infra) | ✓ | — | — | ✓ | ✓ |
Self-assessed against publicly documented features as of May 2026. Vendor capabilities change — please verify before committing to any tool.
No black box. No hand-waving.
Open source
AGPL-3.0. Every heuristic, biomarker, and scoring rule is public and inspectable.
Self-hosted, private
Run it on your own infrastructure. Bring your own API key or go fully offline. Code never leaves your machine.
Deterministic
Zero LLM in scoring. The same input produces the same output, every time. EU AI Act high-risk obligations do not apply.
Reproducible
Every benchmark on this site is measured on real repositories and reproducible on your own.
Questions, answered
What is repowise?
repowise is a codebase intelligence layer for the AI era. It indexes your repo once and serves both your AI coding agents (an architecture-aware wiki, dependency graph, decisions, and nine MCP tools) and the humans accountable for the code (a defect-validated code-health score, change risk, git intelligence, and agent provenance). It is open source and self-hostable.
Is repowise free and open source?
Yes. The core engine is open source under AGPL-3.0 and runs 100% locally: pip install repowise, bring your own API key, or run fully offline with a local model. There are paid hosted tiers for teams that want zero-ops hosting, private-repo PR comments, and managed re-indexing.
Which AI agents and editors does it work with?
repowise exposes your codebase over the Model Context Protocol, so it works with Claude Code, Cursor, Cline, and Codex, plus any other MCP-compatible client. One index serves every agent.
How much does repowise reduce my agent's token usage?
On paired benchmarks on real repositories, loading context through repowise used 96% fewer tokens (2,391 vs 64,039, about 27 times fewer), with 89% fewer file reads and 70% fewer tool calls, at answer quality on par with raw file exploration.
Does my source code leave my machine?
No. repowise is self-hosted with zero telemetry. Source is processed transiently and never persisted, and you can bring your own LLM key or run fully offline via a local model. What is stored is the graph, non-reversible embeddings, generated wiki pages, and git metadata.
Does the code-health score actually predict bugs?
Yes, and it is validated. Across 21 repos and 9 languages the cross-project ROC AUC is 0.74 (up to 0.90 per repo), and ranking by repowise health surfaces 2.3x the defects of a leading commercial tool under the same review budget. Every heuristic is open source so you can reproduce it on your own repo.
Which languages are supported?
Fifteen languages across the headline tiers, with full pipeline depth for Python, TypeScript, JavaScript, Java, Kotlin, Go, Rust, C++, and C#, including framework-aware route-to-handler edges for the major web frameworks.
How does repowise stay up to date as I keep committing?
It updates incrementally. A post-commit hook, file watcher, or webhook re-indexes only what changed, typically a handful of wiki pages in seconds, and every MCP response carries a staleness envelope that warns when the index has diverged from HEAD.
Guides, comparisons, and deep dives

Is AI-written code buggier than human code? We blamed 112,000 commits to find out
We git-blamed 112,382 commits across 28 repos to test whether AI-agent code introduces more bugs than human code. After controlling for size, it doesn't, and its lines last longer.

Does our code-health score actually predict bugs? A leakage-free benchmark
We scored 21 repos six months before their bugs landed to test whether a deterministic code-health score predicts defects. AUC 0.737, and the honest caveats.

Process metrics beat structural metrics for predicting defects
Complexity and code smells are the metrics everyone reaches for. Across 25 biomarkers and 21 repos, the strongest defect predictors were evolutionary, not structural. Here's the data.
Three paths to codebase intelligence
- Self-host — free, forever
pip install repowise— your machine, your server, your CI- AGPL-3.0 · full feature set · code never leaves your infra
- Hosted SaaS — live now
- Managed indexing · team workspaces · semantic chat
- Pro at $15/mo with LLM credits included · Sign up free →
- Enterprise
- On-prem · SSO · role-based access · dedicated support · SLAs