A Rails engine that adds production-grade health check endpoints to any Rails app. Goes well beyond the built-in /up endpoint with 11 built-in checks, parallel execution, structured JSON responses, Prometheus metrics, and a clean configuration DSL.
Built-in checks: database · cache · Redis · SMTP · Sidekiq · SolidQueue · GoodJob · Resque · disk · memory · HTTP
Key features:
- Two-tier endpoints:
/live(liveness — process only) and/ready(readiness — all deps) prevent cascade failures in Kubernetes and behind load balancers - Parallel check execution via
Concurrent::Future— response time bounded by the slowest check, not the sum - Result caching (
config.cache_duration) to absorb high-frequency probe traffic - Prometheus text exposition at
GET /health/metrics(always HTTP 200) - Check groups (
config.group) expose subsets at/health/:group - Per-environment toggling, boot-time validation, and bearer token / IP / custom auth
rails generate rails_health_checks:initializerscaffolds a fully-commented config file- Drop-in replacement for OkComputer — see MIGRATING_FROM_OKCOMPUTER.md
- Upgrading
- Installation
- Rack Applications
- Endpoints
- Configuration
- Authentication
- Built-in Checks
- Notifications
- Prometheus Metrics
- Result Caching
- Per-Environment Toggling
- Check Groups
- Custom Checks
- Migrating from OkComputer
- Performance
- Contributing
- License
GET /health/liveno longer runs dependency checks.
Prior to v1.2.0, /live ran all configured checks (database, Redis, etc.) and returned 503 if any failed. This was readiness behaviour under a liveness name and is the root cause of the cascade failure footgun described below.
What changed: /live now returns 200 OK whenever the Ruby process is alive, regardless of dependency state. Authentication is also skipped on this endpoint so Kubernetes and load balancer probes work without credentials.
What to do: If you were relying on /live to verify dependencies, switch to the new /health/ready endpoint. No configuration changes required.
# Before (was running dependency checks — now only liveness)
GET /health/live → 200 if process alive (deps ignored)
# New endpoint for dependency checks
GET /health/ready → 200 if all deps pass, 503 if any fail
Add to your Gemfile:
gem "rails_health_checks"Then run:
bundle installMount the engine in config/routes.rb:
mount RailsHealthChecks::Engine => "/health"RailsHealthChecks::Rack::App is a mountable Rack app that exposes the same endpoints without requiring ActionDispatch or Rails routing. It is opt-in — the Rails engine is unaffected.
Add to your Gemfile (the gem already lists rails >= 8.0 as a dependency, so activesupport and concurrent-ruby are available):
gem "rails_health_checks"Require and mount the Rack app alongside your existing app:
# config.ru
require "rails_health_checks"
require "rails_health_checks/rack/app"
RailsHealthChecks.configure do |config|
config.checks = [:disk, :memory, :redis]
config.redis_url = ENV["REDIS_URL"]
end
map "/health" do
run RailsHealthChecks::Rack::App
end
run MyApprequire "rails_health_checks/rack/app"
class MyApp < Sinatra::Base
use Rack::URLMap, "/health" => RailsHealthChecks::Rack::App
endrequire "rails_health_checks/rack/app"
class MyApp < Roda
plugin :multi_run
run "/health", RailsHealthChecks::Rack::App
endThe routes are identical to the Rails engine, relative to the mount point:
| Endpoint | Format | Use case |
|---|---|---|
GET/HEAD / |
JSON | Full dependency health (monitoring dashboards) |
GET/HEAD /live |
Plain text | Liveness probe — process only, no deps |
GET/HEAD /ready |
Plain text | Readiness probe — all configured dependency checks |
GET /metrics |
Prometheus text | Prometheus scraping |
GET /:group |
JSON | Scoped check group |
Checks that depend on Rails internals require those libraries to be present in the stack. Checks that use only stdlib or standalone gems work in any Rack context:
| Check | Works without Rails? |
|---|---|
:disk |
Yes |
:memory |
Yes |
:http |
Yes |
:redis |
Yes (requires redis gem) |
:smtp |
Yes (reads ActionMailer config if available, otherwise requires config.smtp_address) |
:database |
Requires ActiveRecord |
:cache |
Requires Rails.cache |
:sidekiq |
Requires Sidekiq |
:solid_queue |
Requires SolidQueue |
:good_job |
Requires GoodJob |
:resque |
Requires Resque |
config.disable :check, in: :env compares against Rails.env in a Rails app. In a non-Rails Rack app it reads ENV["RACK_ENV"] instead (defaulting to "production" if unset):
config.disable :disk, in: :test # compares RACK_ENV when Rails is not definedAll three authentication strategies work identically. When using the custom block strategy, the argument is a Rack::Request instead of ActionDispatch::Request:
RailsHealthChecks.configure do |config|
config.authenticate { |request| request.env["HTTP_X_INTERNAL"] == "true" }
endToken and IP allowlist strategies are unchanged.
| Endpoint | Runs checks? | Format | Use case |
|---|---|---|---|
GET /health/live |
No — process only | Plain text | Kubernetes livenessProbe, load balancer health check |
GET /health/ready |
Yes — all configured deps | Plain text | Kubernetes readinessProbe, external uptime monitors |
GET /health |
Yes — all configured deps | JSON | Monitoring dashboards, alerting pipelines |
GET /health/metrics |
Yes — all configured deps | Prometheus text | Prometheus / OpenMetrics scraping |
GET /health/:group |
Yes — named subset | JSON | Scoped group (e.g. /health/workers) |
/health/live, /health/ready, and /health also respond to HEAD requests.
HTTP status: 200 OK when all checks pass, 503 Service Unavailable when any check fails (except /metrics which always returns 200, and /live which always returns 200).
Using a single health endpoint for both load balancer checks and dependency monitoring is a cascade failure footgun. Here is the exact failure chain:
- Your database has a 30-second blip
- All running pods probe
/health/ready→ all return503 - The load balancer removes every pod from rotation simultaneously
- Traffic has nowhere to go — the app is fully down
- If the same endpoint drives
livenessProbe, Kubernetes begins restarting every pod - Restarting pods reconnect to the still-blipping database, fail again, restart again
- What was a 30-second DB hiccup is now a multi-minute outage driven by a thundering herd of pod restarts
The fix is to separate the two concerns:
| Endpoint | Question it answers | Correct probe |
|---|---|---|
/health/live |
Is the process running and responsive? | livenessProbe, LB health check |
/health/ready |
Are all dependencies reachable? | readinessProbe, uptime monitor |
Liveness (/health/live) — returns 200 OK as long as the Ruby process responds. No dependency checks run. Authentication is skipped so Kubernetes and load balancers work without credentials. When this fails, k8s restarts the pod because the process itself is stuck or crashed.
Readiness (/health/ready) — runs all configured dependency checks. Returns 503 if any check fails. When this fails, k8s stops routing traffic to the pod but leaves it running. The pod rejoins rotation automatically once dependencies recover — no restart, no thundering herd.
Deep JSON (/health) — same dependency checks as /ready, returned as structured JSON with per-check status and latency. Use for monitoring dashboards, alerting, or anywhere you need machine-readable detail. Do not use for liveness or readiness probes.
containers:
- name: web
ports:
- containerPort: 3000
livenessProbe:
httpGet:
path: /health/live # process-only — DB blip does NOT restart this pod
port: 3000
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 3 # restarts only if the process stops responding entirely
readinessProbe:
httpGet:
path: /health/ready # dep checks — stops traffic but does NOT restart the pod
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 2 # removes from rotation after 2 consecutive dep failures
startupProbe: # optional: give the app time to boot before probing
httpGet:
path: /health/live
port: 3000
failureThreshold: 30
periodSeconds: 5Warning: Do not point
livenessProbeat/health/ready. A single dependency failure will cause Kubernetes to restart every pod simultaneously, turning a recoverable dep outage into a full application restart loop.
Always use the liveness endpoint for load balancer health checks. If you use the readiness endpoint and a dependency blips, the load balancer ejects all nodes at once and traffic has nowhere to go.
AWS ALB / NLB (target group health check)
Health check path: /health/live
Healthy threshold: 2
Unhealthy threshold: 3
Timeout: 5s
Interval: 10s
Nginx upstream
upstream rails_app {
server app1:3000;
server app2:3000;
}
server {
location /health/live {
proxy_pass http://rails_app;
}
}HAProxy
backend rails_app
option httpchk GET /health/live
server app1 app1:3000 check
server app2 app2:3000 check
Note: Reserve
/health/readyfor KubernetesreadinessProbeand external uptime monitors (Pingdom, UptimeRobot, Better Uptime). These are the right tools to alert you when dependencies are down — the load balancer is not.
The readiness path defaults to ready (i.e. /health/ready when the engine is mounted at /health). Override it in your initializer:
RailsHealthChecks.configure do |config|
config.readiness_path = "readyz" # → /health/readyz
endThe engine mount point is configurable in config/routes.rb:
mount RailsHealthChecks::Engine => "/healthz"
# exposes: /healthz/live, /healthz/ready, /healthz, /healthz/metrics{
"status": "ok",
"timestamp": "2026-06-08T20:00:00Z",
"checks": {
"database": { "status": "ok", "latency_ms": 4 },
"cache": { "status": "ok", "latency_ms": 1 }
}
}Status values: ok | degraded | critical. Overall status is critical if any check is critical, degraded if any is degraded, ok otherwise.
Run the initializer generator to create config/initializers/rails_health_checks.rb with every option documented as a commented example:
rails generate rails_health_checks:initializerThe generated file (shown below with all options) uses the block-style configure API. Every setting has a sensible default — uncomment only what you need:
# frozen_string_literal: true
RailsHealthChecks.configure do |config|
# Checks to run (default: [:database])
# Available built-ins: :database, :cache, :redis, :smtp, :sidekiq, :solid_queue,
# :good_job, :resque, :disk, :memory, :http
config.checks = [:database]
# Global timeout per check in seconds (default: 5)
config.timeout = 5
# Cache check results for N seconds to avoid re-running on every request (default: nil, disabled)
# config.cache_duration = 10
# ---------------------------------------------------------------------------
# Authentication — all strategies are mutually exclusive; default is public
# ---------------------------------------------------------------------------
# Bearer token: requests must include Authorization: Bearer <token>
# config.token = ENV["HEALTH_TOKEN"]
# IP allowlist: exact IPs or CIDR ranges
# config.allowed_ips = ["127.0.0.1", "10.0.0.0/8"]
# Custom block: return truthy to allow the request
# config.authenticate { |request| request.headers["X-Internal"] == "true" }
# ---------------------------------------------------------------------------
# Per-environment toggling
# ---------------------------------------------------------------------------
# config.disable :disk, in: :test
# config.disable :memory, in: [:test, :development]
# ---------------------------------------------------------------------------
# Check groups — expose subsets at GET /health/:group
# ---------------------------------------------------------------------------
# config.group :system, [:disk, :memory]
# config.group :workers, [:sidekiq, :good_job]
# ---------------------------------------------------------------------------
# Redis check (requires :redis in config.checks and the redis gem)
# ---------------------------------------------------------------------------
# config.redis_url = ENV["REDIS_URL"] # default: redis://localhost:6379/0
# ---------------------------------------------------------------------------
# SMTP check (requires :smtp in config.checks)
# Reads ActionMailer::Base.smtp_settings automatically if not set here.
# ---------------------------------------------------------------------------
# config.smtp_address = "smtp.example.com" # default: ActionMailer config or localhost
# config.smtp_port = 587 # default: ActionMailer config or 25
# ---------------------------------------------------------------------------
# Disk check (requires :disk in config.checks)
# ---------------------------------------------------------------------------
# config.disk_path = "/" # mount point (default: "/")
# config.disk_warn_threshold = 2 * 1024**3 # bytes free → degraded
# config.disk_critical_threshold = 512 * 1024**2 # bytes free → critical
# ---------------------------------------------------------------------------
# Memory check (requires :memory in config.checks)
# ---------------------------------------------------------------------------
# config.memory_threshold = 512 * 1024**2 # RSS bytes → degraded
# ---------------------------------------------------------------------------
# HTTP check (requires :http in config.checks)
# ---------------------------------------------------------------------------
# config.http_url = "https://api.example.com/status"
# config.http_expected_status = 200 # expected response code (default: 200)
# config.http_headers = { "Authorization" => "Bearer #{ENV['API_TOKEN']}" }
# ---------------------------------------------------------------------------
# Sidekiq check (requires :sidekiq in config.checks)
# ---------------------------------------------------------------------------
# config.sidekiq_queue_size = 1000 # total depth → degraded
# ---------------------------------------------------------------------------
# Solid Queue check (requires :solid_queue in config.checks)
# ---------------------------------------------------------------------------
# config.solid_queue_job_count = 500 # pending jobs → degraded
# ---------------------------------------------------------------------------
# GoodJob check (requires :good_job in config.checks)
# ---------------------------------------------------------------------------
# config.good_job_latency = 300 # seconds oldest job waiting → degraded
# ---------------------------------------------------------------------------
# Resque check (requires :resque in config.checks)
# ---------------------------------------------------------------------------
# config.resque_queue_size = 1000 # total depth → degraded
# ---------------------------------------------------------------------------
# Custom checks
# ---------------------------------------------------------------------------
# class MyApiCheck < RailsHealthChecks::Check
# def call
# res = Net::HTTP.get_response(URI("https://api.example.com/status"))
# res.code == "200" ? pass : fail_with("API returned #{res.code}")
# end
# end
#
# config.register :my_api, MyApiCheck.new
# config.register :slow_api, MyApiCheck.new, timeout: 10 # per-check timeout override
endConfiguration is validated at boot time. An unknown check name, a missing http_url for the :http check, or a group referencing an undefined check raises RailsHealthChecks::ConfigurationError on startup rather than silently failing on the first request.
| Option | Type | Default | Description |
|---|---|---|---|
checks |
Array |
[:database] |
Built-in or custom check names to run |
timeout |
Integer |
5 |
Global per-check timeout in seconds |
cache_duration |
Integer|nil |
nil |
Cache results for N seconds; nil disables caching |
readiness_path |
String |
"ready" |
Path of the readiness endpoint within the engine (e.g. "ready" → /health/ready) |
token |
String|nil |
nil |
Bearer token for authentication |
allowed_ips |
Array|nil |
nil |
IP allowlist; accepts exact IPs and CIDR ranges |
redis_url |
String|nil |
nil |
Redis URL for :redis check; falls back to REDIS_URL env var then redis://localhost:6379/0 |
smtp_address |
String|nil |
nil |
SMTP host for :smtp check; falls back to ActionMailer config then localhost |
smtp_port |
Integer|nil |
nil |
SMTP port for :smtp check; falls back to ActionMailer config then 25 |
sidekiq_queue_size |
Integer|nil |
nil |
Total Sidekiq queue depth that triggers degraded |
solid_queue_job_count |
Integer|nil |
nil |
Pending SolidQueue jobs that trigger degraded |
good_job_latency |
Integer|nil |
nil |
Oldest pending GoodJob age (seconds) that triggers degraded |
resque_queue_size |
Integer|nil |
nil |
Total Resque queue depth that triggers degraded |
disk_path |
String |
"/" |
Mount point for :disk check |
disk_warn_threshold |
Integer|nil |
nil |
Free bytes below which :disk reports degraded |
disk_critical_threshold |
Integer|nil |
nil |
Free bytes below which :disk reports critical |
memory_threshold |
Integer|nil |
nil |
Process RSS bytes above which :memory reports degraded |
http_url |
String|nil |
nil |
Target URL for :http check (required when :http is active) |
http_expected_status |
Integer |
200 |
Expected HTTP response code for :http check |
http_headers |
Hash |
{} |
Request headers sent by :http check |
By default health endpoints are public. Use one of the following strategies to restrict access. Unauthenticated requests receive 401 Unauthorized.
Note:
GET /health/livealways bypasses authentication regardless of the configured strategy. Liveness probes are called by Kubernetes and load balancers which cannot pass credentials, so enforcing auth on this endpoint would break infrastructure probing.
RailsHealthChecks.configure do |config|
config.token = ENV["HEALTH_TOKEN"]
endRequests must include Authorization: Bearer <token>.
RailsHealthChecks.configure do |config|
config.allowed_ips = ["127.0.0.1", "10.0.0.0/8"] # exact IPs or CIDR ranges
endRailsHealthChecks.configure do |config|
config.authenticate { |request| request.headers["X-Internal"] == "true" }
endThe block receives the request object and must return a truthy value to allow access. In a Rails app this is ActionDispatch::Request; in the Rack app it is Rack::Request.
| Check | Requires | Description |
|---|---|---|
:database |
— | ActiveRecord SELECT 1 against the primary connection |
:cache |
— | Rails.cache read/write probe; works with any cache store |
:redis |
redis gem |
Direct Redis PING; config.redis_url or REDIS_URL env var |
:smtp |
— | SMTP connectivity via Net::SMTP; reads ActionMailer config automatically |
:sidekiq |
sidekiq gem |
Sidekiq Redis connectivity; optional config.sidekiq_queue_size depth threshold |
:solid_queue |
solid_queue gem |
SolidQueue DB connectivity; optional config.solid_queue_job_count threshold |
:good_job |
good_job gem |
GoodJob queue latency; optional config.good_job_latency (seconds) threshold |
:resque |
resque gem |
Resque Redis connectivity; optional config.resque_queue_size depth threshold |
:disk |
— | Free disk space via df; config.disk_warn_threshold / config.disk_critical_threshold (bytes) |
:memory |
— | Process RSS via ps; optional config.memory_threshold (bytes) reports degraded when exceeded |
:http |
— | HTTP GET to config.http_url; config.http_expected_status and config.http_headers |
All checks run in parallel. Each check times out independently using config.timeout (default: 5s) or a per-check override set via config.register.
Every health check run publishes an ActiveSupport::Notifications event:
ActiveSupport::Notifications.subscribe("health_check.rails_health_checks") do |*args|
event = ActiveSupport::Notifications::Event.new(*args)
Rails.logger.info "Health check: #{event.payload[:status]} (#{event.duration.round}ms)"
# event.payload[:checks] => { database: { status: "ok", latency_ms: 3 }, ... }
endThe payload includes:
| Key | Value |
|---|---|
status |
Overall status: "ok", "degraded", or "critical" |
checks |
Hash of { check_name => { status:, latency_ms:, message: } } |
duration on the event covers the entire parallel check run, not individual checks.
GET /health/metrics returns Prometheus text exposition format (text/plain; version=0.0.4). This endpoint always returns HTTP 200 per Prometheus scraping convention — check state is encoded in metric values.
# HELP rails_health_check_status Health check status (0=ok, 1=degraded, 2=critical)
# TYPE rails_health_check_status gauge
rails_health_check_status{check="database"} 0
rails_health_check_status{check="cache"} 0
# HELP rails_health_check_latency_ms Health check latency in milliseconds
# TYPE rails_health_check_latency_ms gauge
rails_health_check_latency_ms{check="database"} 4
rails_health_check_latency_ms{check="cache"} 2
Latency lines are omitted for checks that do not call measure { }.
By default every request re-runs all checks. Set cache_duration to serve cached results for N seconds, reducing load on the database, Redis, and other dependencies:
RailsHealthChecks.configure do |config|
config.cache_duration = 10 # seconds
endThe cache is keyed per check set — GET /health and GET /health/workers cache independently. The cache is in-process (not shared across dynos/containers), so each instance maintains its own result window.
Disable specific checks in specific environments:
RailsHealthChecks.configure do |config|
config.checks = [:database, :cache, :disk, :memory]
config.disable :disk, in: :test
config.disable :memory, in: [:test, :development]
endThe check is removed from the active list only when Rails.env matches. The in: option accepts a single symbol or an array.
Group related checks and expose them at a dedicated endpoint:
RailsHealthChecks.configure do |config|
config.group :system, [:disk, :memory]
config.group :workers, [:sidekiq, :good_job]
end| Endpoint | Runs |
|---|---|
GET /health/system |
:disk, :memory |
GET /health/workers |
:sidekiq, :good_job |
The response shape is identical to GET /health. Unknown group names return 404 Not Found.
Define a class inheriting from RailsHealthChecks::Check, implement call, and register it:
class PaymentGatewayCheck < RailsHealthChecks::Check
def call
measure do
response = Net::HTTP.get_response(URI("https://api.stripe.com/v1/charges"))
case response.code.to_i
when 200, 401 # 401 = auth error, but gateway is reachable
pass
when 429
warn_with("rate limited (429)")
else
fail_with("unexpected status #{response.code}")
end
end
rescue StandardError => e
fail_with(e.message)
end
end
RailsHealthChecks.configure do |config|
config.register :payment_gateway, PaymentGatewayCheck.new
config.register :slow_gateway, PaymentGatewayCheck.new, timeout: 15
endconfig.register appends the check to the active list automatically.
| Method | Status set | Use when |
|---|---|---|
pass(message = nil) |
ok |
Check passed; optional message |
warn_with(message) |
degraded |
Check is functional but degraded |
fail_with(message) |
critical |
Check failed; service is impaired |
measure { } |
— | Wraps a block and records latency_ms |
State contract: call exactly one of pass, warn_with, or fail_with per call invocation. The check instance is dup'd before each run, so instance variables set during one request do not bleed into the next.
Call the check directly in a unit test — no request stack needed:
RSpec.describe PaymentGatewayCheck do
subject(:check) { described_class.new }
context "when the gateway is reachable" do
before do
stub_request(:get, "https://api.stripe.com/v1/charges")
.to_return(status: 200)
end
it "passes" do
check.call
expect(check.status).to eq("ok")
end
end
context "when the gateway is rate-limited" do
before do
stub_request(:get, "https://api.stripe.com/v1/charges")
.to_return(status: 429)
end
it "warns" do
check.call
expect(check.status).to eq("degraded")
expect(check.message).to include("rate limited")
end
end
endSee MIGRATING_FROM_OKCOMPUTER.md for a full mapping of check names, configuration keys, and endpoint differences.
Quick reference:
| OkComputer | rails_health_checks |
|---|---|
OkComputer::ActiveRecordCheck |
:database |
OkComputer::CacheCheck |
:cache |
OkComputer::RedisCheck |
:redis |
OkComputer::SidekiqLatencyCheck |
:sidekiq + config.sidekiq_queue_size |
OkComputer::HttpCheck |
:http + config.http_url |
OkComputer::CustomCheck subclass |
Subclass RailsHealthChecks::Check |
GET /okcomputer |
GET /health |
GET /okcomputer/all |
GET /health |
See BENCHMARKS.md for throughput numbers, parallel execution speedup, and cache effectiveness measurements. To run the suite locally:
bundle exec rake benchmark- Fork the repository
- Create your feature branch (
git checkout -b my-new-feature) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin my-new-feature) - Create a new Pull Request
The gem is available as open source under the terms of the MIT License.