Skip to content

est/git2www-zerocopy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

git2www-zerocopy

基于 Git Packfile 的零解压静态资源服务 —— 直接从 .pack 文件发送 zlib 压缩数据,无需运行时解压。

Zero-copy static asset server from Git packfiles. Serves compressed blob objects directly over HTTP using Content-Encoding: deflate, with no runtime decompression.

How It Works

Traditional servers:

disk → read → inflate → memory → gzip → send

This server:

disk → packfile → sendfile() → socket

Git packfiles store blob objects as zlib-compressed payloads. The server reads the pack index to locate objects, then sends the raw compressed bytes directly to the client. No decompression, no re-compression.

Constraints

  • Only serves blob objects (JS, CSS, WASM, SVG, fonts, etc.)
  • Requires git repack -a --window=0 (no delta compression)
  • Client must support Content-Encoding: deflate (zlib-wrapped)

Quick Start

# Ensure all objects are packed with no deltas
git repack -a --window=0 --depth=0

# Start the server
python3 main.py

# Test
curl --compressed http://127.0.0.1:8080/README.md

How It Works

Startup

  1. Scans .git/objects/pack/ for .idx and .pack files
  2. Parses pack index (v2 format) to build SHA → offset lookup tables
  3. Runs git ls-tree -r HEAD to map file paths → blob SHAs
  4. If loose objects exist, runs git repack -a --window=0 automatically

Request Flow

GET /app.js
  → path lookup → blob SHA
  → ETag check (If-None-Match) → 304 if match
  → pack index lookup → offset in .pack
  → read header → find compressed range
  → sendfile() to socket
  → Content-Encoding: deflate

ETag

Blob SHA-1 hash is used as ETag natively:

ETag: "30f915b07626f0bcd08d0a2728fcb806272dfee9"

Same content → same hash → no extra computation needed.

Features

  • Zero-copy transfer via socket.sendfile() (OS-level kernel transfer)
  • Multi-packfile support — searches all packs in .git/objects/pack/
  • 304 caching with Git blob hash as ETag
  • Threaded request handling
  • Zero dependencies — Python stdlib only
  • Auto-repack on startup if loose objects or missing packfiles

Limitations

  • Delta objects (OFS_DELTA, REF_DELTA) are not supported
  • Range requests are not supported (compressed stream)
  • No directory listing or index.html fallback
  • Single-threaded pack index parsing (startup only)

Architecture

main.py
├── PackIndex      — .idx v2 parser, SHA → pack offset
├── PackReader     — .pack reader, header parsing + sendfile
├── PackStore      — multi-pack lookup
├── get_blob_map() — git ls-tree → path → SHA
├── Handler        — HTTP GET/HEAD with ETag support
└── main()         — startup + serve

Testing

# Start server in one terminal
python3 main.py

# In another terminal
curl -v http://127.0.0.1:8080/README.md
curl -v --compressed http://127.0.0.1:8080/README.md

# Test caching
ETAG=$(curl -sI http://127.0.0.1:8080/README.md | grep -i etag)
curl -v -H "If-None-Match: $ETAG" http://127.0.0.1:8080/README.md

License

GPL-3.0

About

Serve git packfile on HTTP with zerop copy

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages