Stories by イルカ Borovec on Medium

The Open-Source Shepherd: Why Your Best Work Is Building a Project That Doesn’t Need You

イルカ Borovec — Thu, 11 Jun 2026 20:01:01 GMT

Generated illustration specifically for this blogpost

There’s a path beyond just maintaining an open-source project: grow contributors, share ownership, and build a community that carries the project forward without you.

There’s a moment — and if you’ve been maintaining an open-source project long enough, you know it — when you realize you’ve been doing this wrong. For me, it came during my time leading open-source at Lightning AI. The project had a creator who set the original vision, and I was one of the people steering the ship alongside them. But when I looked at the PR queue on a Monday morning — 20+ open pull requests, half of them waiting on one of maybe two or three people who could approve — the bottleneck was obvious. It wasn’t that the code was complex or that the architecture decisions required deep context. It was that we’d built a culture where almost everything funneled through a handful of gatekeepers.

We had contributors, sure. But the pattern was always the same: someone would show up, submit a PR for a specific itch they needed scratched, and disappear. The drive-by contribution. Valuable, but not sustainable. What we needed wasn’t more one-time contributors — it was retention. People who stuck around, who understood the codebase well enough to review others, who felt enough ownership to care about the next release, not just their own feature.

A good project is swamped with an active community and contributions. That’s the signal. But getting there means doing something that doesn’t come naturally to most technical people: investing in people as much as in code.

Generated illustrtaion specificly for this blogpost

Maintainer vs. Shepherd

The word “maintainer” is telling. It implies upkeep — keeping the lights on, fixing what breaks, patching the roof. And patching the roof will always be part of the job, for both the maintainer and the shepherd. That’s the emergency action, and someone has to do it. But the difference is in everything else you build under that roof. The maintainer patches. The shepherd builds a whole living space where others want to stay and contribute.

I’ve started thinking of the role as shepherding. Not in some grandiose sense — just the honest observation that the job is less about the code and more about the flock. A shepherd doesn’t carry every sheep. They set direction, keep things moving, watch for danger, and — critically — they raise other shepherds.

The traditional maintainer mindset says, “I know this codebase best, so I should make the important decisions.” The shepherd mindset asks: “How do I make sure this project has five people who can make these decisions, not just me?”

This isn’t a philosophical distinction. It’s a practical one, and it changes what you do on a daily basis.

Generated illustrtaion specificly for this blogpost

The Four Duties

Over the years — shepherding packages like TorchMetrics and PyTorch Lightning, building research tools like BIRL, and now working on projects like RF-DETR at Roboflow — I’ve come to see the role as four duties running in parallel. None of them is optional, and most maintainers only focus on the first two.

1. Codebase Health

This is the duty everyone recognizes. CI is green. Tests pass. Dependencies are current. Technical debt doesn’t spiral. You can look at the codebase on any given day, and it’s in a state where a new contributor could clone, build, and start reading without fighting the toolchain.

At Lightning, we invested heavily in ecosystem CI — automated compatibility testing across the dependency tree — because a broken downstream package is a broken trust relationship, even if it’s technically not your fault. I wrote about this approach separately, and later about building a custom CI bot on top of it, but the core insight is simple: codebase health isn’t just about your repo. It’s about every repo that depends on yours. I’ll be honest that some of the CI infrastructure involved private resources that I can’t fully disclose, but the principles — testing across your dependency tree, catching breaking changes before your users do — are universal and can be implemented with public tooling.

Here’s the part that connects to shepherding: if you’re the only person who can diagnose a CI failure, you haven’t solved the problem. You’ve just become the problem with a faster response time. Document your CI. Write runbooks. Make sure at least two other people can look at a red build and know what to do.

2. Release Direction

Someone has to decide what goes into the next release. Which features get prioritized? Which breaking changes are worth the migration cost? When to say “not now” to a perfectly good PR because it doesn’t fit the roadmap.

This is where things get nuanced. Some projects, by design, have a single point of approval for releases — especially when the project is tied to a company’s product or marketing cadence. You can debate whether that’s still a community project in the fullest sense, but it’s a reality many of us work within. That said, even within those constraints, there’s a spectrum. The traditional maintainer rarely allows any discussion about release scope. The shepherd makes their reasoning legible.

A public roadmap is one of the most powerful shepherding tools you have. Not a vague wish list — actual priorities with reasoning. When the direction is clearly given, the execution becomes a swarm. Contributors can self-select into work that matters. They can propose features with confidence that they’ll be evaluated against a known framework, not against the maintainer’s unpublished preferences.

When I was working on API deprecation strategies at Lightning (which eventually led to building pyDeprecate — I wrote about the rationale in depth here), the hardest part wasn’t the technical implementation. It was establishing a process transparent enough that contributors could propose deprecations themselves, with confidence they’d be reviewed fairly. pyDeprecate itself never got much public traction — it’s a narrow-scope, single-author utility — but the process it codified mattered far more than the tool.

3. Mentorship

This is where most projects fall short, and I’ll be the first to admit I was short on it in the early stages. I treated mentorship as a nice-to-have, something I’d get to when the “real” work was done. It took a while to realize that mentorship is the real work — and it’s an investment that snowballs fast. The time you spend teaching one contributor how to navigate the codebase pays off handsomely when they start reviewing PRs, answering issues, and onboarding the next person. The compounding is real.

Mentorship in OSS isn’t onboarding docs (though you need those). It’s the daily habit of reviewing code in a way that teaches, not just approves. It’s the difference between commenting “change this to X” and “here’s why X works better in this context, and here’s the pattern we follow across the codebase.” The first gets the PR merged. The second builds a contributor who doesn’t need you next time.

Concrete things that work:

Label issues honestly. “Good first issue” means something to newcomers. But also label the harder stuff — “needs design discussion” or “help wanted: experienced” — so growing contributors can find their next challenge. The worst thing you can do is leave the impression that only easy tasks are available for community members.
Review with context, not just corrections. When someone submits a PR that works but doesn’t follow the project’s patterns, that’s a teaching moment, not a rejection. Show them the pattern. Explain the why. Link to another PR that solved something similar. This takes longer than a quick “LGTM” or a terse “please fix,” but it compounds.
Give people real responsibility before you think they’re ready. I’ve seen contributors grow fastest when they were trusted with something slightly beyond their comfort zone — triaging a release, owning a module, being the point person on an issue area. Some of them stumbled. All of them learned faster than they would have by watching from the sidelines.

4. Community Warmth

This one sounds soft, but it’s load-bearing.

Open source runs on volunteer time. People contribute to projects where they feel welcomed, where their effort is acknowledged, and where disagreements are handled with respect. I’ve seen technically excellent projects bleed contributors because the issue tracker felt hostile, or because PRs sat unreviewed for weeks with no communication.

Community warmth isn’t about being nice for the sake of being nice. It’s about response time on first-time contributor PRs. It’s about saying “thank you for this, and here’s why we’re going a different direction” instead of closing an issue silently. It’s about writing release notes that credit contributors by name — as we do on RF-DETR, where each release calls out new contributors and their specific additions.

Sometimes shepherding means carrying the last mile for a contributor’s PR. They’ve done 90% of the work, but life got in the way, or they’re stuck on a tricky edge case. The maintainer mindset says “stale, closing.” The shepherd mindset says, “let me finish this for you, because the contribution matters and the person should feel their effort wasn’t wasted.” That single act — picking up someone’s work and seeing it through — does more for retention than any amount of documentation.

At RF-DETR, we’ve seen this pattern play out. Community members have contributed everything from pose estimation support to bug fixes in evaluation metrics. Some of those PRs needed significant guidance. Some needed us to carry the finishing touches. The ones where we invested that effort? Those contributors came back. The ones where we let PRs go stale? They didn’t.

Generated illustrtaion specificly for this blogpost

The Paradox: Shepherding Yourself Out of a Job

Here’s the part that’s hard to internalize: the better you do this, the less essential you become. And that’s supposed to feel good.

I’ll be honest — it didn’t, at first. After years of being the person who knew every corner of the codebase, who could trace any bug to its origin, who held the institutional memory of every design decision, stepping back felt like losing something. There’s an identity wrapped up in being the person everyone turns to.

But here’s what I saw happen when I started actively working to make myself less necessary: the project got better. Not because I was bad at my job, but because five people with overlapping context make better decisions than one person with total context. They see blind spots you miss. They respond to issues in time zones you’re asleep in. They bring perspectives you don’t have.

The bus factor isn’t a joke metric. If you run git shortlog -sn on your repo and one name dominates every area, you have a fragile project, no matter how good that person is. The shepherd's duty is to make that distribution wider, not narrower.

I’ve carried this lesson from Lightning to Roboflow. When I look at a project now, I don’t ask “how much can I contribute?” I ask “how quickly can I build a team that can carry this without me?” Those are fundamentally different questions, and they lead to fundamentally different daily decisions.

What This Looks Like in Practice

Let me be concrete about the daily habits, because the philosophy is useless without them.

When reviewing a PR from a repeat contributor: Resist the urge to rewrite their code in your style. If it works and is readable, approve it. Save your energy for architectural feedback. The goal is to let their voice into the codebase, not to make every file look like you wrote it.
When a new issue comes in that you could answer in 30 seconds: Wait. See if someone else picks it up. If they do, even if their answer isn’t perfect, let them finish. Jump in only to correct, not to override. This feels inefficient in the moment. Over months, it builds a team that handles triage without you.
When planning a release: Write the plan in a public discussion or issue. Tag people who’ve contributed to relevant areas. Ask for input on priorities. The release will be slower the first few times. After that, it’ll be faster, because people understand the process and have opinions.
When you’re about to take on a task yourself: Ask: “Is there someone who could do this with guidance?” If yes, guide them instead. Your duty isn’t to write the most code. It’s to make the most people productive.
When a contributor’s PR goes quiet: Reach out. Ask if they need help. Offer to pair or to carry the last stretch. Don’t just close it after 30 days of silence. The person behind that PR might be your next core contributor — if you show them their work matters.

Audit Your Project

Open your project right now — the issue tracker, the PR queue, the last three releases — and ask yourself these questions.

If you disappeared tomorrow, who would merge the next PR? Who would cut the next release? If the answer is “nobody” or “it would stall,” that’s your signal.

Look at your contributor graph. Is it growing, flat, or shrinking? A project with a healthy shepherd has a contributor curve that trends upward independently of the shepherd’s own commit activity.

Check your issue response times. Not yours — everyone else’s. Are other people answering questions? Are they answering them well? If the community is silent except when you speak, you haven’t built community. You’ve built an audience.

Count the number of people who can approve a PR with confidence. If it’s fewer than three, you have work to do — and that work is more important than any feature on your roadmap.

The best open-source leaders I’ve known share a common trait: they’re proud of what happens when they’re not in the room. Their legacy isn’t the clever algorithm or the elegant API. It’s the contributor who became a co-maintainer. It’s the process that runs smoothly without oversight. It’s the project that thrives precisely because it was built to not depend on any single person.

That’s what shepherding means. Not writing the most code. Not making the most decisions. Building something that outlives your involvement — and having the honesty to measure yourself by that standard.

If this resonates with you — if you’re wrestling with these questions in your own project — I’d love to hear from you. Reach out, share your experience, or just start a conversation. Sometimes the hardest part of shifting from maintainer to shepherd is knowing you’re not the only one figuring it out.

You can find my work and projects on GitHub.

The Open-Source Shepherd: Why Your Best Work Is Building a Project That Doesn’t Need You was originally published in CodeX on Medium, where people are continuing the conversation by highlighting and responding to this story.

What Is Your Python Package’s Public API, Really?

イルカ Borovec — Thu, 19 Feb 2026 16:41:01 GMT

Generated ilustration for public/private API.

The answer isn’t black or white — and reality shifts depending on how many people use your package. That’s the real challenge.

If you maintain a Python package, you’ve probably asked yourself: what exactly counts as my public API? Unlike Java or C++, Python doesn’t have public or private keywords. We just have conventions — underscores, __all__, documentation — which are basically gentlemen's agreements. But here's the thing: this seemingly abstract question has real consequences for how you work.

Here’s what I mean: sure, you can change anything you want — it’s your code. But the fallout is completely different depending on what you touch. Rename a private helper? Nobody even notices. Rename a public function without warning? Get ready for bug reports, frustrated users, and the worst kind of damage: people quietly deciding your package is too unpredictable and moving on.

Anything that’s actually public needs a proper deprecation cycle: warn users, give them time to adapt, then remove it. That’s extra releases, extra coordination, extra work. So when I ask “what is public?” I’m really asking, “what can’t I just change on a whim?” And in Python, that’s way less obvious than you’d think.

I’m going to walk through the three ways Python lets you define a public API, explain why they’re really a spectrum that depends on how many people use your code, and share a mental model I’ve found helpful for thinking about API boundaries.

What Does Python Actually Say?

Before we get into the messy reality, let’s check what Python officially recommends. PEP 8 — the style guide everyone references — actually has a whole section on “Public and Internal Interfaces.” Here’s what it says:

“Any backwards compatibility guarantees apply only to public interfaces.”

“Documented interfaces are considered public, unless the documentation explicitly declares them to be provisional or internal.”

“All undocumented interfaces should be assumed to be internal.”

“To better support introspection, modules should explicitly declare the names in their public API using the all attribute.”

“Internal interfaces should still be prefixed with a single leading underscore.”

So PEP 8 mentions all three mechanisms — documentation, __all__, and underscores — but treats them as complementary layers, not pick-one-or-the-other options. In a perfect world, everyone would follow this to the letter, and there'd be no confusion. In reality? Most packages use bits and pieces of these, and that's where things get messy.

The Three Levels of “Public” (In Reality)

Python doesn’t actually enforce any of this. Most packages lean heavily on one mechanism and barely touch the others. Let me walk through them from strictest to most permissive.

Level 1: Documented in Docs Only

PEP 8 says: documented interfaces are public. This is the clearest possible boundary — if it’s not in the docs, it doesn’t exist. But it’s completely an honor system. Python won’t stop anyone from importing undocumented stuff.

The bigger problem? Documentation has a reputation issue. Many developers see docs as training wheels for people who can’t read code. They’ll go straight to your source, browse through the modules, and use whatever looks useful. And let’s be honest — docs drift. That function you refactored three months ago? The docs probably still describe the old signature. Sure, you can auto-generate docs from docstrings, but how many times have you revisited what was once documented as public and is now effectively private? The code changes, the docstrings get updated (maybe), but that overview page listing “Public APIs”? Yeah, nobody touched that in two years. So while “documented = public” is the gold standard in theory, in practice, it’s the weakest enforcement mechanism you have.

Level 2: Exported via init.py and all

A step closer to enforcement. PEP 8 recommends that modules explicitly declare their public API using __all__, and that imported names should be considered implementation details unless explicitly re-exported. The __all__ variable has a concrete runtime effect: it controls what from package import * pulls in. But that's all it gates — a user can still write from your_package.internal_module import something, and Python won't complain.

But here’s where it gets practical: in the real world (and if you check the docs for most well-maintained packages), the recommendation is actually to use from package import SpecificThing instead of import package or the wildcard. Why? You don't want to import all the unused ballast functions, and in many cases, you're avoiding the import of heavy dependency packages that get loaded when you import the whole module. So , __all__ and a clean __init__.py aren't just about API boundaries — they're about performance and keeping your import times reasonable.

Level 3: Anything Without a Leading Underscore

This is what actually happens in the wild. Python says names starting with _ are private. Everything else? Fair game. (Even CPython itself struggles with this — PEP 689 had to formalize what "unstable" means for the C API, illustrating the same tension at the language level.) Users browsing your code, hitting tab-complete, or checking dir() will see every non-underscored name and assume it's meant for them. This is the weakest definition — but it's what your users fall back on when you haven't been clear about the other two.

The Adoption Gradient: When Stuff Starts Breaking

Here’s the part that caught me off guard when I started maintaining packages: your public API isn’t defined by your conventions. It’s defined by how many people use your code. The more users you have, the bigger the blast radius when you change anything.

There’s actually a name for this — Hyrum’s Law (from a Google engineer):

“With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.”

Python makes this worse because there’s literally nothing stopping people. No access modifiers, no compiler errors, no sealed modules. A leading underscore is just a polite “hey, maybe don’t use this.” When someone discovers that your_package.utils._parse_header() does exactly what they need, they'll use it, build their workflow around it, and file an issue when you rename it.

What ends up happening: as your user base grows, your intended API and your actual API drift apart. You can still change whatever you want — but every name people depend on is another landmine. The trick is figuring out which names people are actually using.

Generated illustration for the followig section.

My Take: Three Tiers of Package Maturity

So what do you actually do about this? After working on my own packages and contributing to projects at different scales, I’ve settled on a simple mental model. It’s not a rule — just something that’s helped me figure out how strict I need to be at different stages.

Tier 1: The Home Project — Public = What’s in the Docs

When your package is basically a personal tool or a research project with maybe a dozen users, you’re in the Wild West. You probably know most of your users personally, and nobody’s expecting rock-solid stability. Think internal lab tools, proof-of-concept implementations, or that utility script that grew legs and became a package. If I needed to rename some internal function, I’d just do it and mention it in the changelog. To be honest, at this stage, there often isn’t much documentation. The “public API” is whatever made sense at the time — if it worked for the research paper or the specific use case, it stayed. Being super strict about API boundaries would’ve just slowed things down for no real benefit. At this stage, you’re optimizing for velocity, not compatibility.

Tier 2: The End-User Package — Public = What’s in init

Once people are installing your package from PyPI and using it directly, expectations change. They’re doing from your_package import SomeClass and expecting it to work across updates. Think pyDeprecate or cachier — tools people deliberately install and build into their code. At this stage, you start formalizing documentation. You’re writing proper guides, listing what’s public, and users actually expect that information to be there. Sometimes you define __all__ to make the boundaries explicit, sometimes you just keep __init__.py clean and let that do the talking. It helps IDEs autocomplete properly and makes your intent obvious. You're no longer just optimizing for velocity — you're balancing that with some stability guarantees.

Tier 3: The Dependency Package — Public = Everything without _

This is when you become the backbone for other people’s work. Other packages list you in their requirements.txt and users who've never heard of you get affected by your releases. PyTorch Lightning and TorchMetrics are in this category — thousands of downstream packages depend on them. At this point, you need to be aware that a single change can ripple through systems you never imagined your work would be used in. Someone built a production data pipeline around your package. Another team is using it in medical imaging software. A startup has it deep in its infrastructure. You find out about these use cases when something breaks, not before. At this scale, it doesn’t matter that you only documented five functions. If a hundred packages are importing some helper function because it’s convenient and doesn’t have an underscore, removing it is a breaking change. Every non-underscored name is now a promise.

You know how people install the entire scikit-learn package just to use train_test_split? That's the reality — users will depend on whatever is convenient, not just what you documented.

This is exactly why we built Ecosystem CI — to run downstream tests against every nightly build and catch these breaks before shipping.

What This Actually Means Day-to-Day

The bottom line: you can change whatever you want — it’s your code. But changing stuff people treat as public without warning means angry users, broken builds, or the slow death of trust that comes from being “that package that keeps breaking stuff.” The responsible move is deprecation: warn people, give them time, then remove it. That’s real work. And in Python, where “public” is fuzzy, you often don’t know upfront how much pain you’re signing up for.

Some habits that help:

Define all early. It’s basically free and makes your intentions clear. Way easier to add names later than to yank something people already depend on.
Use underscores liberally. Any function, class, or module that’s not meant for users should start with _. It's the clearest signal Python gives you.
Keep init.py minimal. Only re-export what you actually want people using. Don’t do from .utils import * — it leaks internal names everywhere.
Be explicit in docs. Add a “Public API” section (see Griffe’s recommendations for a solid template). SemVer requires you to declare what’s public anyway — the clearer you are, the more your version numbers actually mean something.
Make deprecation easy on yourself. Deprecation is inevitable for any evolving package. Tools like pyDeprecate make it painless — one decorator, automatic forwarding, warnings that don’t spam logs, argument remapping. When it’s that easy, you stop avoiding necessary changes.

Bottom Line: Your Users Define Your API

Python’s flexibility cuts both ways. The “we’re all adults here” philosophy makes the language great to use, but it also means your internal implementation is one import away from becoming someone’s hard dependency. The earlier you accept this and set clear boundaries, the fewer nasty surprises you’ll hit when you need to refactor.

Here’s the reality:

Small project? Your docs are your API
People installing directly? Your __init__ is your API
Do other packages depend on you? Everything without an underscore is your API, like it or not

Know which tier you’re in, and you’ll know how careful you need to be — and whether that “quick refactor” is going to be silent or heard across the entire ecosystem.

Ever had your API boundaries bite you? I’d love to hear about it in the comments.

What Is Your Python Package’s Public API, Really? was originally published in CodeX on Medium, where people are continuing the conversation by highlighting and responding to this story.

How to organize a medical imaging challenge: Lessons from ANHIR

イルカ Borovec — Wed, 05 Nov 2025 19:47:15 GMT

A practical guide to running an academic challenge, from conception to publication, including mistakes I made so you don’t have to.

Continue reading on Data Science Collective »

BIRL: Benchmarking Image Registration Through Landmarks

イルカ Borovec — Fri, 24 Oct 2025 15:02:01 GMT

generated ilustration for thsi post

The ANHIR Challenge Problem

When the organizers of the ANHIR challenge at ISBI 2019 set out to compare automatic histological image registration methods, they faced a fundamental problem: how do you fairly evaluate algorithms that align tissue images with wildly different staining protocols, deformations, and artifacts? The answer became BIRL — a Python framework that turns the messy process of benchmarking into something systematic and reproducible.

The challenge itself was ambitious. Teams worldwide submitted methods to register 355 histological images spanning eight tissue types with eighteen different stains. Some images showed lung tissue with H&E staining, others displayed kidney sections with immunohistochemistry markers, and still others presented breast tissue with completely different protocols. The tissue could be torn, folded, or distorted during preparation. Staining intensity varied unpredictably. Yet somehow, algorithms needed to find correspondence between these radically different appearances of the same underlying anatomy.

Sample images from aet d

Landmarks as the Universal Metric

BIRL’s solution was elegantly simple: use manually annotated landmarks — at least forty per image pair, ideally distributed uniformly across the tissue — and measure how far apart corresponding points end up after registration. This uniform distribution ensures the landmarks represent deformations across the entire tissue rather than clustering in easy-to-annotate regions. The Target Registration Error became the universal currency for comparing methods. When you normalize it by the image diagonal, you get a percentage that works across datasets of any size. The best methods in ANHIR achieved median errors around 0.44% of the image diagonal, successfully registering over 98% of landmark pairs across the entire dataset.

What made this possible was BIRL’s design philosophy. Rather than forcing researchers into rigid workflows, the framework provides base classes that new methods can inherit from with minimal overhead. Want to test your novel registration algorithm? Override a couple of methods, specify your parameters in a YAML file, and run. The framework handles parallel execution across your dataset, checkpoints progress so you can resume after interruptions, and generates visualizations showing where alignments succeed or fail. It’s the kind of infrastructure that disappears into the background once you understand it, letting you focus on the actual registration problem.

Integrated Methods and What ANHIR Revealed

The framework came bundled with wrappers for established methods that represented the state of the art in 2019. There’s bUnwarpJ from the ImageJ ecosystem, elastix from the ITK medical imaging toolkit, ANTs with its sophisticated diffeomorphic transforms, and several others. Each integration shows you how to configure parameters, invoke the registration, and extract results in BIRL’s expected format. Running a benchmark becomes as straightforward as pointing to a CSV file describing your image pairs and launching the script.

However, the real insight from ANHIR — and what BIRL enabled researchers to discover — was the significant impact of domain adaptation. Off-the-shelf methods, configured for general medical imaging, consistently underperformed compared to approaches tailored for histological challenges. The winning methods employed multi-resolution strategies, starting with coarse alignment before refining at higher resolutions. They used robust initial registration steps to handle the extreme appearance differences between differently stained tissues. These weren’t necessarily novel algorithms; they were careful orchestrations of existing techniques, and BIRL made this kind of systematic comparison possible.

The Registration Landscape in 2025

Since the ANHIR challenge concluded and BIRL’s active development ended in 2020, the registration landscape has evolved considerably. As an academic project tied to a specific challenge, BIRL’s archival was natural and expected. Several tools have emerged or matured since then, many exceeding BIRL’s capabilities in specific domains while building on the evaluation principles it established:

VALIS (2021–present): Production-ready automated solution for whole slide images with groupwise alignment and modern feature detectors (DISK, DeDoDe), handles multi-gigapixel files effectively but is primarily CPU-based and may struggle with severely deformed tissues. Published in Nature Communications 2023, actively maintained with regular updates.
MMIR (2024–present): Cloud-based platform with plugin architecture and web GUI for collaborative clinical research, offering scalability and easy algorithm integration, though requiring Docker setup and potential custom development for specialized modalities. Published in BMC Medical Informatics and Decision Making in March 2024, represents a recent advancement in multimodal histological registration.
HistoReg (2021–present): Specialized for sequential variably-stained histology with integrated ANHIR evaluation, automatically processes various file formats but requires compilation with external dependencies, and is restricted to research use. Published in Applied Sciences 2021, actively maintained by CBICA.
Continuous Registration Challenge (2019–present): Maintains ongoing leaderboards for organ-specific datasets with standardized evaluation protocols, promoting reproducible progress but limited to predefined datasets and requiring code submissions rather than local experimentation. Established alongside ANHIR and continues as an active community resource.
elastix / SimpleElastix (2003–present): Highly configurable ITK-based toolkit with extensive transform options and proven performance across medical imaging domains, though the learning curve is steep and it lacks integrated benchmarking workflows. Elastix was first developed 2003, published in IEEE TMI 2010; SimpleElastix was added 2015. Actively maintained with regular releases, moved to GitHub 2017, and remains the gold standard for medical image registration.
AROSICS (2017–present): Optimized for satellite and multi-sensor remote sensing data with robust subpixel shift detection, efficiently handling global and local misalignments, but poorly suited to histological or medical applications due to different imaging assumptions. Published in Remote Sensing 2017, actively maintained with updates.

BIRL sits among these as the benchmark’s tool from a specific era. While no longer actively developed, it established patterns for rigorous evaluation that these newer tools build upon. Many of these alternatives now offer capabilities — automated pipelines, cloud infrastructure, modern feature detectors — that weren’t priorities or even available when BIRL was designed for the 2019 challenge.

Visualization of Target error

Why BIRL Still Matters

The repository was archived in 2023, but that doesn’t diminish its value. The code is open-source under a BSD license, ready for forking and adaptation. The architectural patterns it established — landmark-based validation, modular method integration, comprehensive evaluation metrics — remain relevant even as the methods being evaluated evolve. Deep learning approaches to registration are emerging rapidly, and they need benchmarking infrastructure just as much as classical optimization methods did.

What makes BIRL particularly valuable is its approach to controlled experimentation. You can generate synthetic data with known deformations, vary staining appearance, and inject artifacts — all to stress-test your method before running it on precious real data. This complements the landmark-based evaluation on real images, where ground truth is approximated through manual annotation. The framework handles the tedious infrastructure: CSV-based dataset configuration, Docker images for reproducible environments, performance profiling utilities, parallel execution with checkpointing, and automated visualization of alignment quality.

Perhaps most importantly, BIRL encoded the lessons learned from a major community challenge into reusable infrastructure. The ANHIR results, published in IEEE Transactions on Medical Imaging, showed that successful registration requires more than just implementing the latest algorithm. You need robust initialization, appropriate regularization, multi-scale strategies, and careful handling of the specific artifacts your imaging modality introduces. Having a framework that lets you systematically test these design decisions against quantitative metrics changes how you approach the registration problem.

Looking forward

The field keeps moving forward with transformer-based feature detectors, optical flow networks, and physics-informed deformation models. All of these developments benefit from standardized evaluation frameworks that make progress measurable rather than anecdotal. For researchers starting new registration projects, BIRL’s annotation protocols and baseline method implementations provide a solid foundation. You can quickly establish whether your problem is actually solvable — whether there are consistent anatomical features to match across your imaging conditions — before investing in sophisticated solutions. The challenge dataset with its 355 carefully annotated image pairs remains a valuable resource, representing diverse registration scenarios from rigid alignment of serial sections to handling tears and folds in multi-modal correspondence.

A Framework That Balances

Looking at BIRL now, years after the challenge concluded, what stands out is how it balanced generality with specificity. General enough to accommodate diverse registration methods and evaluation scenarios. Specific enough to enforce practices — like minimum landmark counts and standardized error metrics — that make results comparable. This balance is rare in research software, which tends to drift toward either overly abstract frameworks that don’t do anything concrete or overly specific tools that can’t adapt to new problems.

The repository at github.com/Borda/BIRL contains everything needed to start benchmarking: sample datasets, integration templates for multiple methods, preprocessing utilities, evaluation scripts, and comprehensive documentation. While archived, it remains fully functional, and its BSD license welcomes derivatives. Whether you’re evaluating a novel registration algorithm, comparing existing methods on your dataset, or teaching computational approaches to medical imaging, BIRL provides tested infrastructure that handles the tedious parts while staying out of your way.

The broader registration ecosystem continues its evolution, but the fundamental question BIRL addresses remains constant: how do we know if one method is actually better than another? Answering that question rigorously, with standardized protocols and quantitative metrics, is what transforms registration from an art into engineering. The framework that enabled 256 teams to compete fairly in ANHIR is ready for whatever registration challenges come next.

If you want to learn more, reach out on LinkedIn

BIRL: Benchmarking Image Registration Through Landmarks was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Build a Custom CI Bot: Integrate GitHub Apps with Lightning for Fast, Scalable MLOps Workflows

イルカ Borovec — Mon, 22 Sep 2025 09:27:46 GMT

AI generated illustration for this post

Build a Custom CI Bot: Integrate GitHub Apps with Lightning for Faster, Scalable Development and MLOps Workflows

In the fast-paced world of software development, especially in machine learning and data-intensive projects, continuous integration (CI) is essential for maintaining code quality and accelerating iteration cycles. While standard CI platforms like GitHub Actions, Jenkins, or CircleCI provide solid foundations, they often have inherent limitations that can restrict advanced workflows. In this post, I introduce a lightweight GitHub App bot that bridges GitHub’s ecosystem with Lightning’s powerful compute platform. We’ll explore why it’s worth trying, the benefits it brings to development and MLOps lifecycles, a brief overview of its architecture and task lifecycle, and three unique design choices that make it stand out.

Limitations of Standard CI Platforms

Traditional CI systems excel at basic testing and deployment, but can struggle with the need for high computational power and flexibility. Common pain points include:

Limited Hardware Options: Most platforms offer only basic CPU runners, with limited access to specialized GPUs like NVIDIA A100, V100, or T4 — critical for ML workloads requiring GPU acceleration for training and inference tests.
Cost Inefficiencies: While some CI platforms offer pay-as-you-go pricing models, these options tend to be significantly costlier than prepaid plans and usually do not provide access to cheap spot instances. Spot instances can reduce costs by up to 90%, but standard CI systems rarely support them or handle their preemption risks effectively.
Scalability Constraints: Parallelism is often capped, making large-scale testing across multiple environments (different OS, Python versions, hardware configurations) cumbersome and costly.
Customization Gaps: Repository-specific rules or dynamic workflows triggered by webhook events often require complex setups, increasing maintenance overhead.

These limitations can slow development, especially in MLOps, where iterative experimentation on diverse compute environments is vital.

How This Custom CI Bot Enhances Development and MLOps

This custom bot connects GitHub Apps directly to Lightning’s distributed compute platform, delivering a more tailored CI experience. Acting as a bridge, it lets you run automated pull request (PR) checks with repository-specific rules powered by scalable cloud resources. Benefits include:

Enhanced Resource Access: Instantly access a broad range of GPU types and configurations on demand. For MLOps, this means seamless incorporation of model training or validation steps into your CI pipeline without owning infrastructure.
Cost Optimization with Spot Instances: Lightning’s support for spot instances enables economical job runs. The bot intelligently handles preemptions by retrying failed tasks, ensuring reliability without overspending — ideal for non-critical tests or large batch jobs.
Faster Iterations and Scalability: Distributed execution across multiple environments speeds feedback loops. In development, this accelerates PR reviews; in MLOps, it supports end-to-end pipelines from data processing to deployment, reducing time from commit to production.
Custom Automation: Workflows are defined via simple YAML files triggered by GitHub webhook events (e.g., PR opens, pushes), promoting best practices like automated code quality checks, multi-environment testing, and secure repository access that streamline team collaboration.
Pay-Per-Use Efficiency: You pay only for what you use, aligning costs with actual workload rather than fixed quotas. This model suits open-source projects and startups scaling MLOps without heavy upfront investments.

Overall, this approach transforms CI from a checkbox task into a strategic enabler that tackles compute bottlenecks and fosters robust development and MLOps practices.

Architecture and Task Lifecycle

The bot’s architecture emphasizes simplicity and scalability, using Python core logic integrated with event-driven components.

High-Level Architecture

GitHub App Integration: The bot registers as a GitHub App with secure, scoped repository access. It listens for webhooks signaling events such as PR creation or updates.
Task Queuing with Redis: Incoming events are queued in Redis, a fast in-memory data store that acts as a message broker. This allows asynchronous processing and load balancing among workers.
Worker Processes: Python-based workers pull tasks from Redis and execute them by interfacing with Lightning’s platform to spin up compute jobs.
Lightning Compute Layer: Jobs run on Lightning’s distributed cloud, supporting multi-node setups, GPU acceleration, and spot instances. Results are collected and reported.
Feedback Loop: Status updates (pass/fail) are posted back to GitHub PRs as comments or checks.

Task Lifecycle

Trigger: A GitHub event (e.g., PR open, push commit) triggers a webhook sent to the bot.
Processing and Queuing: The bot validates the event against repository-specific YAML configurations, then queues the task in Redis with environment specifications and commands.
Execution: A worker dequeues the task, authenticates with Lightning, provisions resources (e.g., GPU spot instances), and runs workflows like testing across Python versions on Ubuntu with CUDA.
Completion and Reporting: Results are aggregated and posted back to the PR. Failures may trigger retries or notifications.
Cleanup: Resources are released efficiently.

This design ensures low-latency responses and adapts to varying loads, serving both small and large-scale projects effectively.

Demo use-case

Three Design Highlights

Here are three design choices that demonstrate the bot’s practical value:

Event-Driven YAML Workflows: Instead of hardcoding logic, workflows are defined dynamically in repo-specific YAML files. This enables flexible rules (e.g., run GPU tests only for ML-related changes), reducing boilerplate and easing customization without redeployments.
Redis for Decoupled Workers: Redis decouples event handling from execution, enabling horizontal worker scaling and fault tolerance — if a worker crashes, tasks remain queued. This integrates smoothly with Lightning’s job scheduling for resilient MLOps pipelines.
Secure, Pay-Per-Use Lightning Integration: The bot uses GitHub App tokens for authentication and Lightning’s API for on-demand compute. This approach minimizes security risks (no stored credentials) and aligns costs with usage via spot instances, while providing traceability through GitHub checks.

If you want to supercharge your CI without the complexity of full-fledged platforms, try this bot — clone the repo at https://github.com/Borda/GitHub-app-bot and set it up in minutes. Feedback and contributions are welcome! What custom CI tweaks have you tried in your projects?

Build a Custom CI Bot: Integrate GitHub Apps with Lightning for Fast, Scalable MLOps Workflows was originally published in CodeX on Medium, where people are continuing the conversation by highlighting and responding to this story.

Mastering API Deprecation in Python: The Pain Points and How pyDeprecate Can Help

イルカ Borovec — Sun, 21 Sep 2025 10:32:05 GMT

Illustratiion generated for this post.

Evolving an API doesn’t have to break your users — forward old calls, remap arguments, and enforce removal dates with a single decorator.

In the ever-evolving world of software development, change is inevitable. Libraries and frameworks must adapt to new requirements, improved architectures, or external dependencies. However, this evolution often comes with a hidden cost: managing deprecated APIs. As a maintainer, you want to push your codebase forward, but you can’t afford to alienate users by breaking backward compatibility overnight. This tension creates real pain points that can slow down progress and increase maintenance overhead. In this post, we’ll unpack these challenges and introduce pyDeprecate, a simple yet powerful Python package that streamlines the deprecation process through elegant wrappers. Whether you’re a library author or contributor, understanding this value framework can transform how you handle API transitions.

The Pain Points of API Deprecation: A Side Effect of Software Evolution

Software doesn’t stand still. As projects grow, you might refactor code for better performance, outsource features to specialized packages, or align with industry standards. These improvements are essential for long-term viability, but they introduce deprecation as a necessary side effect. Here’s where the issues arise:

Balancing Evolution and Compatibility: Abruptly removing old functions or classes breaks user code, leading to frustration, bug reports, and lost trust. Yet, retaining everything forever results in bloated codebases, increased technical debt, and harder-to-maintain systems. You’re stuck in a trade-off: innovate or preserve the status quo?
Warning Overload and User Annoyance: Using Python’s built-in warnings module is a start, but it’s rudimentary. Warnings can spam consoles endlessly, overwhelming users and diluting their impact. Customizing messages, limiting frequency, or routing them (e.g., to logs) requires boilerplate code that’s error-prone and time-consuming.
Complex API Migrations: Not all deprecations are simple. Arguments might be renamed, dropped, or reordered in the new API. Manually handling remapping in every deprecated function duplicates effort and invites bugs. For classes or methods, the complexity multiplies, often requiring inheritance hacks or redundant implementations.
Phased Rollouts and Version Management: Deprecations aren’t one-and-done. You need to specify timelines (e.g., deprecated in version X, removed in Y) and sometimes chain multiple levels for gradual changes. Conditional logic, like skipping based on dependencies or versions, adds another layer of manual work.
Maintenance Burden on Developers: Without a dedicated tool, you’re reinventing deprecation logic across projects. This scatters focus from core features to housekeeping, especially in open-source, where user feedback loops are unpredictable.

These pains aren’t just theoretical — they’ve plagued projects like scikit-learn or Django, where API stability is crucial. The result? Delayed releases, frustrated teams, and suboptimal user experiences. But what if there was a way to mitigate these issues without compromising on evolution or compatibility?

Generated image specificly for this blogpost.

The full use-cases guide covers real-world patterns across functions, methods, classes, Enums, dataclasses, and constants — including argument mapping and CI deadline enforcement.

Introducing pyDeprecate: A Simple Wrapper for Pain-Free Deprecations

Enter pyDeprecate, a lightweight Python package designed to address these exact challenges. At its heart is a decorator-based wrapper that automates deprecation handling, allowing you to evolve your software while maintaining backward compatibility. Install it via pip install pyDeprecate, and you gain a tool that's Pythonic, extensible, and focused on developer efficiency.

Here’s how pyDeprecate delivers value by directly tackling the pains outlined above:

Effortless Forwarding and Compatibility: Wrap deprecated functions to automatically reroute calls to successors. Your old API remains functional during the transition, preventing breaks while gently guiding users to update. This preserves trust and buys time for adoption.
Intelligent Warning Control: Limit warnings to a set number (e.g., 5 per function, or -1 to always warn) to avoid overload. Customize messages, streams (e.g., logging instead of stderr), and templates for clarity. No more console clutter — users get meaningful nudges without annoyance.
Seamless Argument Remapping: Use built-in mapping to handle API mismatches. Rename or drop parameters on the fly, reducing manual intervention and bugs. The same args_mapping works whether you forward to another callable or remap arguments in place with TargetMode.ARGS_REMAP, and args_extra lets you inject new required arguments the old API never had.
Flexible Phasing and Conditions: Define deprecation timelines and chain multiple wrappers for multi-stage rollouts. Add conditional skips (e.g., based on package versions) to adapt behavior dynamically, making complex migrations straightforward.
Reduced Maintenance Overhead: By centralizing deprecation logic in a reusable decorator, pyDeprecate eliminates boilerplate. It’s lightweight (zero runtime dependencies, Python 3.9+ compatible) and tested rigorously, freeing you to focus on innovation rather than upkeep.

In essence, pyDeprecate turns deprecation from a burden into a streamlined process. It’s not about adding features — it’s about removing friction, ensuring your software evolves sustainably while keeping users happy.

Under the Hood: pyDeprecate’s Architecture

pyDeprecate’s simplicity stems from its core @deprecated decorator, which wraps your code with smart logic:

Configuration: Pass parameters like target (a callable to forward to, TargetMode.ARGS_REMAP to remap arguments in place, or the default TargetMode.NOTIFY for a warn-only notice), versions (deprecated_in, remove_in), args_mapping, args_extra, num_warns, stream, skip_if, and update_docstring to inject the deprecation notice straight into your rendered API docs.
Runtime Behavior: On call, it checks skip_if, issues capped warnings, remaps arguments, and forwards or proceeds accordingly. A thread-safe counter tracks warnings efficiently.
Beyond the decorator: deprecated_class() wraps a class, Enum, or dataclass in a transparent proxy for a clean name change, and deprecated_instance() does the same for module-level constants and objects.
Utilities: void quiets IDE warnings in empty bodies, assert_no_warnings keeps deprecation noise out of your tests, and the audit helpers (validate_deprecation_expiry, validate_deprecation_chains) enforce removal deadlines and catch accidental deprecation chains in CI.

The core stays deliberately small and carries no runtime dependencies — the optional audit extra pulls in packaging only when you want CI deadline checks. Full details are in the GitHub repo.

Real-World Applications: A Few Patterns

To illustrate, here are four scenarios where pyDeprecate shines. The use-cases guide walks through the rest.

pip install pyDeprecate

1. Function Forwarding for Quick Refactors

Move or rename a function without breaking callers — the decorator forwards every call to the successor, so the old body becomes dead code.

from deprecate import deprecated

# NEW/FUTURE API - renamed to be explicit about what it computes
def compute_sum(a: int = 0, b: int = 3) -> int:
    return a + b
# DEPRECATED API - `calculate` was the original name before the rename
@deprecated(target=compute_sum, deprecated_in="0.1", remove_in="0.5")
def calculate(a: int, b: int = 5) -> int:
    pass  # body is not needed - calls are forwarded to compute_sum
print(calculate(1, 2))  # 3, with a one-time FutureWarning

2. Argument Mapping for API Overhauls

Bridge mismatched parameters, e.g., to scikit-learn — ideal for evolving Machine Learning (ML) APIs. Rename arguments, drop them (map to None), and route the notice wherever you like.

import logging
from sklearn.metrics import accuracy_score
from deprecate import deprecated, void

@deprecated(
    target=accuracy_score,
    stream=logging.warning,
    num_warns=5,
    template_mgs="`%(source_name)s` was deprecated, use `%(target_path)s`",
    args_mapping={"preds": "y_pred", "target": "y_true", "blabla": None},
)
def depr_accuracy(preds: list, target: list, blabla: float) -> float:
    return void(preds, target, blabla)
print(depr_accuracy([1, 0, 1, 2], [0, 1, 1, 2], 1.23))  # 0.5

3. Self-Argument Remapping for Signature Changes

When the function stays but its signature changes, use TargetMode.ARGS_REMAP to rename an argument in place. The notice fires only when a caller actually passes the old name, so anyone already migrated sees no noise.

from deprecate import TargetMode, deprecated

@deprecated(
    target=TargetMode.ARGS_REMAP,
    args_mapping={"coef": "new_coef"},
    deprecated_in="0.2",
    remove_in="0.4",
)
def any_pow(base: float, coef: float = 0, new_coef: float = 0) -> float:
    """`coef` is mapped to `new_coef`; the body only uses the new name."""
    return base ** new_coef
print(any_pow(2, 3))  # 8

4. Class and Enum Deprecation

Rename a class, Enum, or dataclass with deprecated_class() — a transparent proxy forwards all access to the replacement, no inheritance gymnastics required.

from enum import Enum
from deprecate import deprecated_class

# NEW/FUTURE API — renamed to be more descriptive
class ThemeColor(Enum):
    RED = 1
    BLUE = 2

# DEPRECATED alias - the proxy forwards all access to ThemeColor
Color = deprecated_class(target=ThemeColor, deprecated_in="1.0", remove_in="2.0")(ThemeColor)
print(Color.RED is ThemeColor.RED)      # True
print(Color(1) is ThemeColor.RED)       # True
print(Color["RED"] is ThemeColor.RED)   # True - one-time FutureWarning on first access

When you instead need to forward a class’s construction to a successor, decorate its __init__ with @deprecated(target=SuccessorClass) so the notice fires at instantiation time.

Enforcing the Timeline in CI

A deprecation notice is only as good as the removal that eventually follows it. With the audit extra installed, validate_deprecation_expiry() raises in CI once a remove_in date has passed, so deprecated code cannot quietly outlive its deadline:

pip install 'pyDeprecate[audit,cli]'

Conclusion: Evolve Your Code with Confidence

API deprecation is an unavoidable side effect of software growth, but it doesn’t have to be painful. By addressing compatibility tensions, warning chaos, and migration complexity, pyDeprecate lets you evolve your projects sustainably. Its wrapper-based approach delivers clear value: less effort, fewer bugs, and happier users — from a simple rename to proxy-wrapped Enums and multi-hop argument chains. The use-cases guide covers the full set of patterns if you want to go deeper.

Resources

pyDeprecate documentation — full API reference, use-case patterns, CI audit tools
Getting started guide — installation + quick start
Use cases with code — all deprecation patterns with working examples
GitHub repository
PyPI

Mastering API Deprecation in Python: The Pain Points and How pyDeprecate Can Help was originally published in CodeX on Medium, where people are continuing the conversation by highlighting and responding to this story.

Git sub-modules: How to automate updates via GitHub’s Pull Request

イルカ Borovec — Mon, 06 Nov 2023 12:47:01 GMT

There are several ways to define external dependencies in your project — using a package manager if the dependency is installable or just…

Continue reading on CodeX »

Scalable Automation for Retry/Rerun Failed Checks in GitHub Actions

イルカ Borovec — Mon, 25 Sep 2023 19:06:41 GMT

A brief guide on how to automatically rerun (for N times) your failed checks/jobs with GitHub Actions on open Pull Requests.

Continue reading on CodeX »

Kaggle hacking: Validate a simple hypothesis against a hidden dataset

イルカ Borovec — Thu, 05 May 2022 17:38:14 GMT

This post will share how to creatively debug your Kaggle submission, particularly the submission format, and how I ran a simple hypothesis…

Continue reading on TDS Archive »

How to Finetune Yolov5 in a Multi-GPU environment

イルカ Borovec — Tue, 15 Feb 2022 20:15:43 GMT

Illustration Photo by Ivan Babydov from Pexels

How to Fine-Tune YOLOv5 on Multiple GPUs

It is generally known that Deep Learning models tend to be sensitive to proper hyper-parameter selection. At the same time, when you search for the best configuration, you want to use maximal resources…

Object detection is one of the advanced Computer Vision (CV) tasks that any ML/AI has not yet fully mastered. In short, the task is composed form localization and identification/classification of given objects in an image.

A Gentle Introduction to Object Recognition With Deep Learning - Machine Learning Mastery

Object Detection is still quite a hot topic in the research space. Also, there is a relatively high demand for using such AI models in productions for many practical applications such as people detection in a scene or identification of items on a shop’s shelves. This naturally yields a larger collection of model architectures and even more implementations publically shared as open-source projects. One of them is YOLO v5 which claims to have one of the best rations between performance (accuracy/precision) and inference time.

Besides training and inference, this project also offers running hyper-parameters search based on evolution algorithm tuning. In a nutshell, the algorithm proceeds in generations, so it runs a few short training and chooses the best based on their performances. Then these best are blended with some minor random changes and trained again.

Simple screen finetuning

The simplest way to search for hyper-parameters is to run the training with an enabled evolution --evolve argument. But this uses just a single GPU at most, so how about the remaining we have?

python train.py \
    --weights yolov5s6.pt \
    --data /home/user/Data/gbr-yolov5-train-0.1/dataset.yaml \
    --hyp data/hyps/hyp.finetune.yaml \
    --epochs 10 \
    --batch-size 4 \
    --imgsz 3000 \
    --device 0 \
    --workers 8 \
    --single-cls \
    --optimizer AdamW \
    --evolve 60

Eventually, we can run multiple training but how do we push them to collaborate? Luckily they can share a file with dumped training results from which the new population is drawn. Thanks to the randomness in the next generation, this can seem as exploring a much large population, as the author states.

COCO Finetuning Evolution · Issue #918 · ultralytics/yolov5

Illustration from Ultralytics tutorial with permission of Glenn Jocher.

What are the options?

Running multiple training processes while using a different GPU can be set by specifying it in the --device argument. But how to maintain numerous processes and not lose them if you log out or accidentally brode your internet connection.

nohup

It is the first and more trivial way to keep your process running.

nohup python train.py ... > training.log

Unfortunately, there is no simple way to connect back to the once despatched process, so it is mainly paired with redirecting stream to a file. Then you can constantly refresh the file, but it still does not always play nicely with progress bars that may be stretched over many lines.

screen

This Unix application is convenient in many situations and here for spinning each process in its screen, and you can later traverse among all of them. Inside each screen, you have full access to control or kill the particular process.

screen -S training
python train.py ...

docker

Another way is using docker containers with shared volume. Its advantage is that you can prepare customer docker images with a fixed environment that can eventually run anywhere, even on another server/cluster…

docker build -t yolov5 .
docker run --detach --ipc=host --gpus all -v ~:$(pwd) yolov5 \
  python train.py ...
docker ps

The commands above first build a docker image from the project folder. Later it spins a container and immediately detaches it with complete visibility to the GPUs and mapping the user home in the container to your local project folder. The last command is to list all running containers.

Spin multiple collaborative dockers

You would need to create each screen separately and start the particular training process within with screen. This extra work can be easily spoiled by choosing a wrong or already in use device.

An advantage of docker is that we can quickly write a loop to start as many containers as GPUs when spinning any docker container. The only limitation could be sufficient RAM which can also be limited on the docker side with --memory 20g argument. To properly utilize the shared dack of experiments, you need to fix project/name, set exist-ok and resume argument.

for i in 1 2 3 4 5; do
  sudo docker run -d --ipc=host --gpus all \
    -v ~:/home/jirka \
    -v ~/gbr-yolov5/runs:/usr/src/app/runs \
    yolov5 \
  python /usr/src/app/train.py \
    --weights yolov5s6.pt \
    --data /home/jirka/gbr-yolov5-train-0.1-only_annotations/dataset.yaml \
    --hyp /usr/src/app/data/hyps/hyp.finetune.yaml \
    --epochs 10 \
    --batch-size 4 \
    --imgsz 3000 \
    --workers 8 \
    --device $i \
    --project gbr \
    --name search \
    --exist-ok \
    --resume \
    --evolve 60
done

Later, to re-connect to the running container, for example, for monitoring the progress, you can enter the container with:

sudo docker ps
sudo docker attach --sig-proxy=false

Then use CTRL+c to detach back to your user terminal.

The last in case you do not want to let the training finish or you need to terminate all the running containers you call:

sudo docker kill $(sudo docker ps -q)

Stay tuned and follow me to learn more!

🐡Starfish detection: Flash⚡EfficientDet

About the Author

Jirka Borovec holds a Ph.D. in Computer Vision from CTU in Prague. He has been working in Machine Learning and Data Science for a few years in several IT startups and companies. He enjoys exploring interesting world problems, solving them with State-of-the-Art techniques, and developing open-source projects.

How to Finetune Yolov5 in a Multi-GPU environment was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.