Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Firecrawl Ruby SDK

Ruby SDK for the Firecrawl v2 web scraping API.

Prerequisites

  • Ruby >= 3.0

Installation

Add to your Gemfile:

gem "firecrawl-sdk", "~> 1.5"

Or install directly:

gem install firecrawl-sdk

Quick Start

require "firecrawl"

# Create a client
client = Firecrawl::Client.new(api_key: "fc-your-api-key")

# Or load from FIRECRAWL_API_KEY environment variable
client = Firecrawl::Client.from_env

# Scrape a single page
doc = client.scrape("https://example.com")
puts doc.markdown

Environment Setup

export FIRECRAWL_API_KEY="fc-your-api-key"
# Optional: custom API URL
export FIRECRAWL_API_URL="http://localhost:3002"

API Reference

Scrape

# Basic scrape
doc = client.scrape("https://example.com")
puts doc.markdown

# Scrape with options
doc = client.scrape("https://example.com",
  Firecrawl::Models::ScrapeOptions.new(
    formats: ["markdown", "html"],
    only_main_content: true,
    wait_for: 1000
  ))
puts doc.html

Video Extraction

Use the video format on supported video URLs, including YouTube and TikTok. The returned video field is a signed URL to the extracted video file.

doc = client.scrape("https://www.youtube.com/watch?v=dQw4w9WgXcQ",
  Firecrawl::Models::ScrapeOptions.new(formats: ["video"]))

puts doc.video

Product Extraction

Use the product format on product pages to get structured product data (title, brand, category, and per-variant price, availability, and images). It is the deterministic counterpart to the LLM-based json format. The returned product field contains the extracted fields.

doc = client.scrape("https://example.com/products/widget",
  Firecrawl::Models::ScrapeOptions.new(formats: ["product"]))

puts doc.product

Parse

Upload a local file (html, pdf, docx, etc.) via multipart form data and parse it synchronously. Parse options intentionally exclude browser-only features such as change tracking, screenshot, branding, audio, video, product, actions, wait_for, location, and mobile. The proxy option only accepts "auto" or "basic".

# From disk
file = Firecrawl::Models::ParseFile.from_path("./document.pdf")

# Or from memory
file = Firecrawl::Models::ParseFile.new(
  filename: "upload.html",
  content: "<html>hi</html>",
  content_type: "text/html"
)

doc = client.parse(file,
  Firecrawl::Models::ParseOptions.new(formats: ["markdown"]))
puts doc.markdown

Crawl

# Crawl with auto-polling (blocks until complete)
job = client.crawl("https://example.com",
  Firecrawl::Models::CrawlOptions.new(limit: 50))
job.data.each { |doc| puts doc.markdown }

# Async crawl
response = client.start_crawl("https://example.com",
  Firecrawl::Models::CrawlOptions.new(limit: 10))
puts response.id

# Check status
status = client.get_crawl_status(response.id)
puts status.status

# Cancel
client.cancel_crawl(response.id)

Batch Scrape

urls = ["https://example.com/page1", "https://example.com/page2"]

# Batch scrape with auto-polling
job = client.batch_scrape(urls,
  Firecrawl::Models::BatchScrapeOptions.new(
    options: Firecrawl::Models::ScrapeOptions.new(formats: ["markdown"])
  ))
job.data.each { |doc| puts doc.markdown }

Map

# Discover URLs on a website
result = client.map("https://example.com")
result.links.each { |link| puts link["url"] }

# With options
result = client.map("https://example.com",
  Firecrawl::Models::MapOptions.new(limit: 100, search: "blog"))

Search

# Web search
results = client.search("firecrawl web scraping")
results.web&.each { |r| puts r["url"] }

# With options
results = client.search("latest news",
  Firecrawl::Models::SearchOptions.new(limit: 5, location: "US"))

Agent

# Run an AI agent task (blocks until complete)
status = client.agent(
  Firecrawl::Models::AgentOptions.new(
    prompt: "Find the pricing information",
    urls: ["https://example.com"]
  ))
puts status.data

Usage & Metrics

# Check concurrency
concurrency = client.get_concurrency
puts concurrency.concurrency

# Check credit usage
usage = client.get_credit_usage
puts usage.remaining_credits

Configuration

client = Firecrawl::Client.new(
  api_key: "fc-your-api-key",
  api_url: "https://api.firecrawl.dev",  # custom API URL
  timeout: 300,                           # HTTP timeout in seconds
  max_retries: 3,                         # automatic retries
  backoff_factor: 0.5                     # exponential backoff factor
)

Error Handling

begin
  doc = client.scrape("https://example.com")
rescue Firecrawl::AuthenticationError => e
  puts "Invalid API key: #{e.message}"
rescue Firecrawl::RateLimitError => e
  puts "Rate limited: #{e.message}"
rescue Firecrawl::JobTimeoutError => e
  puts "Job #{e.job_id} timed out after #{e.timeout_seconds}s"
rescue Firecrawl::FirecrawlError => e
  puts "Error (#{e.status_code}): #{e.message}"
end

Development

Building from Source

cd apps/ruby-sdk
bundle install

Running Tests

# Unit tests
bundle exec rake test

# With API key for E2E tests
FIRECRAWL_API_KEY=fc-your-key bundle exec rake test

License

MIT License - see LICENSE.