Understand Your Rails App In Production

"That's weird... it shouldn't be doing that..."

Stop guessing and finally understand what your Rails app is actually doing in production.

✅ 6 Weeks Workshop
✅ Hands On And Practical
✅ No DevOps Experience Needed
✅ Less Than 4 Hours A Week
✅ Designed For Seniors And Leads
✅ Solo Or Team Workshops

  • Fix Bugs Faster

    Observability lets you catch issues the moment they happen. It's like having X-ray vision for your app, so you can fix problems before they mess with your users' experience. No more guessing games—just quick, effective solutions.

  • Find Performance Bottlenecks

    Observability helps you see where you can speed things up and make everything run smoother. Your users will notice the difference, and they'll keep coming back because they trust your app to perform.

  • Less Incidents, Less Stress

    Observability gives you the power to set alerts that warn you about potential issues before they blow up. It's like having a crystal ball for your app's health, letting you stay ahead of the game and keep everything running smoothly.

The Problems

Diagnosing Incidents Takes Hours

Ever felt like you're chasing shadows? Debugging incidents without observability is just that. Imagine it's 2 AM - you're squinting at logs trying to figure out why your app just tanked. Not fun, right?

Observability transforms your bug hunt into a guided tour. No more guesswork, just straight answers. And yes, you can actually sleep peacefully at night.

App Health Is a Mystery

You're driving blindfolded. That's you managing a Rails app without observability. System health? A big question mark. CPU spikes, memory leaks, and you’re none the wiser until your app slows to a crawl—or worse, crashes.

Improving Performance Is A Wild Goose Chase

Performance bottlenecks are sneaky, and without the right tools, they're nearly invisible. Trying to connect a profiler to a Rails app in production? Good luck with that.

Even if you can figure out the slow areas of the app, without observability you can't accurately determine why they're slow.

Background Jobs Are A Black Box

Those background jobs you rely on? They can fail silently and spectacularly. Without observability, you won’t know until it’s too late. Emails undelivered, reports half-baked—chaos ensues.

Wasted Engineering Time Reinventing The Wheel

Rails is a "batteries included" framework, but that doesn't seem to apply to observability.

Logging has been a weak point in the ecosystem. Structured logging is coming in Rails 8, but none of the other observability best practices come out of the box.

Every team I've seen has to start from scratch - writing Sidekiq middleware, adding metrics, and solving a ton of low level fiddly details.

This is frustrating, wastes hours of engineering time and distracts from the bigger business goals. The end result is often low quality and missing crucial instrumentation.

What Do You Get?

  • Tailored Strategy Blueprint

    Over the 6 weeks you'll build a personalized, practical, step-by-step action plan designed for your observability needs. 

  • Steps To Observable Software

    Master observability in just 4 hours a week — no DevOps experience required! Learn my simple 5-step process to boost software insights with ease.

  • Rails Logging Showdown

    Comparisons of the best Rails structured logging libraries. See how they compare to make an solid choice and avoid issues later.

  • Bug Prioritization Worksheet

    Stop wasting time on the wrong bugs! Use this actionable worksheet to identify the most business-critical issue you want to focus on in the workshop.

  • No-Bankruptcy Blueprint

    Maximize observability without breaking the bank. Follow this checklist to boost insights while keeping costs under control.

  • Demystifying 3 Pillars

    Get clarity on observability’s key pillars with this one-page guide. Cut through the vendor BS and finally understand traces, logs, and metrics.

  • Query Like a Pro

    Learn five powerful tips to supercharge your querying skills in any observability tool — become the data detective your team needs.

  • Slack Channel Access

    Ask any observability questions you have in the group Slack. We're here to support each other.

  • Accountability Followups

    Learning observability is fine, but if you don't take action, it's all pointless. I follow up with you every week to help you see real progress against the plan.

  • Team Only

    Real Time Collaboration

    Work directly with me in real time. Solve problems faster, get immediate feedback and move as one. Whether pairing or mobbing, your team will level up their skills and ship quality instrumentation quicker.

  • Team Only

    Pull Requests

    I’ll create Pull Requests asynchronously, so you can easily review them during our weekly calls. This ensures you're using our time together effectively. Access needed to your repos.

  • Team Only

    Deep Context

    I’ll dive into your business problems and domain, applying all my knowledge of observability to your specific challenges. This ensures solutions we create are aligned with your business goals for maximum impact.

The Dream

What does better observability in your Rails app look like?

Quick Incident Resolutions

Observability transforms your bug hunt into a guided tour. With the right tools, you get straight answers instead of guesswork.

Real-time insights mean you can pinpoint issues immediately and resolve them faster. Say goodbye to sleepless nights and hello to efficient, stress-free debugging.

Complete Health Metrics at Your Fingertips

Observability removes the blindfold. Finally, your team can get a clear view of your app's health. Track background job queue latency, request errors, and other vital metrics in real time.

This proactive approach helps you catch potential issues before they become incidents and before customers even notice.

Answers Within Seconds

Metrics capture the what (e.g. background jobs are failing) but they don't give you the why (e.g. a spammer is DDOSing your site). This workshop embraces ideas from Observability 2.0 to help you capture vital context alongside the metrics.

End result? An app where you can ask any question you can think of and get an answer within seconds.

Effortless Performance Optimization

Observability makes performance tuning straightforward. With detailed metrics and tracing, you can identify and understand performance issues quickly. Find out exactly where the slowdowns are happening and why. This way, you can optimize effectively and ensure your app runs smoothly.

Reliable Monitoring of Background Jobs

Understand exactly what's going on with your background jobs. Monitor your jobs in real time to ensure they run smoothly. Get alerts the moment something goes wrong, so you can fix issues before they escalate.

No more guessing if your jobs are working as expected - know for sure.

Designed For Ruby On Rails

I've worked for years to instrument Rails apps. It's been a long journey but I now have a repeatable formula you can implement for the essentials of observability.

Once you have the basics in place I coach you beyond the basics into the domain of your Rails app in your company and we customise for your use case.

Pricing

Choose a plan that suits you.

  • Team

    Personalised Support To Fix Bugs 20x Faster For Your Team.
    $
    2000
    $
    2000
    • ✓
      Team Coaching
    • ✓
      Weekly Live Sessions
      Designed around your team
    • ✓
      Personalised Action Plan
      Personalized roadmap to focus on areas most relevant to your app's needs.
    • ✓
      Full Workshop Recordings
    • ✓
      Downloadable Resources
    • ✓
      Slack Channel Access
    • ✓
      Priority Email Support
  • Solo

    Learn How To Fix Bugs 20x Faster with Group Coaching.
    $
    150
    $
    150
    • ✗
      Team Coaching
    • ✓
      Weekly Live Group Sessions
    • ✓
      Personalised Action Plan
    • ✓
      Early Access to Materials
    • ✓
      Everything In Standard

😭 Poor Observability

Your Rails app has poor observability. So what?

I've seen Rails apps where an incident looks a bit like this...

  • 1

    Customers Report Problems

    It all starts on a Friday afternoon. Ping.

    You get a Slack notification. You're on call this week.

    Then another. And another. Seems like customers can't get their password reset emails.

    Uh oh. You haven't worked on that part of the system.

    No worries - that's why you use Datadog.

    Fix production bugs quickly (2) images 8
  • 2

    Everything Seems Fine

    Your team doesn't find the logs useful.

    Sure enough - they aren't useful this time either.

    All requests to the app are successful.

    Nothing concerning there.

    There are a handful of poorly structured traces.

    They show nothing of any concern.

    Fix production bugs quickly (2) images 10
  • 3

    Drop Everything To Fix Defect

    The product owner needs this fixed immediately.

    Time to context switch.

    So you drop the current feature, which was due next sprint.

    Make a ticket, move into in progress, create a branch.

    You look over the password reset code for problems.

    Check the configuration for the email provider.

    Everything looks fine.

    Pretty quickly, you're stuck.

    Fix production bugs quickly (2) images 14
  • 4

    Pressure Builds

    The CEO is asking your product owner what's going on. 

    The support channel lights up like a Christmas tree.

    Lots of customers are complaining.

    They can't sign into the app and they can't reset their password.

    They're locked out.

    Many take to Twitter to complain.

    Fix production bugs quickly (2) images 18
  • 5

    Deploy Extra Logging

    You and a senior colleague pair on the issue.

    He agrees about the lack of data - you need more information.

    He suggests adding some extra logs.

    But... where?

    You pair for 45 minutes and hack some logging into a Sidekiq middleware.

    It takes 20 minutes until the logging is live.

    15 minutes for the tests to run and another 5 for the app to deploy.

    Whilst waiting for the app to deploy, the issue disappears.

    Fix production bugs quickly (2) images 19
  • 6

    Issue Resolves Itself

    Password reset emails are working again.

    Very weird.

    The product owner spends 2 hours on the cleanup.

    There's no way of knowing the true scale of the issue.

    All they have is 34 Slack threads.

    So they email out an apology to all 34 customers.

    It's time consuming and embarrassing.

    Fix production bugs quickly (2) images 21
  • 7

    Fix It Later

    There's a huge pressure to deliver features.

    You put a ticket on the backlog to investigate the issue.

    Everyone agrees - we'll fix it later.

    This isn't an issue that's affecting customers any more.

    The ticket is pushed back sprint after sprint.

    After all, that was just a one time issue.

    Probably won't happen again. Right? Right.

    Fix production bugs quickly   euruko 2024 (1)
  • 8

    2 Months Later

    Groundhog Day.

    The same issue comes back.

    There's one customer who was impacted by the first incident too.

    They post on Facebook that they'll never use your app again.

    The delete their account and ask for a refund.

    Customers are annoyed. So is the CTO.

    But remember - you added the extra logging!

    Good job you did that.

    Now you can diagnose the issue.

    Fix production bugs quickly (2) images 29
  • 9

    Diagnose with Extra Logging

    You search in Datadog for "Email Failed".

    This was the log message you added when email sending failed.

    Nothing relevant comes back.

    So... the emails aren't failing?

    But why aren't they getting sent?

    An hour has passed since the start of this second incident.

    The CEO is not happy.

    The product owner is not happy.

    Customers are definitely not happy.

    You feel utterly helpless. There's still not enough data.

    The problems pass, you add more random logging, guessing what might be wrong.

    Fix production bugs quickly (2) images 31
  • 10

    Vicious Cycle

    Every incident you can't fix hurts team morale.

    The customers are more and more annoyed.

    The root cause of the issue is never diagnosed.

    More interruptions from planned feature work.

    More wasted engineer time.

    Product Owners are puzzled why the issue can't just be fixed.

    Engineers become cynical, burned out.

    Eventually they're sick of the chaos and find another job.

    Ultimately, no-one wins.

    Fix production bugs quickly (2) images 35

🤩 Great Observability

Here's the same root cause.

This time the Rails app has great observability.

  • 1

    Alert in Slack

    It starts on a Thursday evening. Ping.

    Queues in Sidekiq are set up with an SLA.

    The queue that sends password reset emails has an SLA that all jobs will complete within 5 minutes.

    This queue is taking up to 30 minutes for jobs to complete.

    That's weird.

    This started happening 7 minutes ago, so customers won't be complaining just yet.

    You have time to fix the issue before anyone notices.

    Fix production bugs quickly (2) images 148
  • 2

    Check Out Queue Latency

    The app code instruments queue latency for every job performed.

    It also instruments dimensions such as queue and job class.

    It was a lot of work to set that up, but it gives you lots of querying options in situations like this.

    You query Datadog for the P95 latency of the within 5 minute queue.

    Woah - it's rising rapidly.

    Let's figure out which jobs are taking the longest.

    Fix production bugs quickly (2) images 150
  • 3

    Group By Job Class

    In a couple of clicks you group job duration by job class.

    There are a flood of jobs with the class "ElasticSearchIndexingJob".

    Hmmm. Weird.

    Too many jobs? Or are the jobs too slow?

    Grouping the number of jobs in the queue by job class...

    ...shows that it's a volume issue.

    What's enqueuing all these jobs?

    Fix production bugs quickly (2) images 158
  • 4

    Drill Into Requests

    Every job that's performed is tagged with the controller and action that enqueued it.

    Grouping by controller and action for the ElasticSearchIndexingJob...

    ...masses of jobs are enqueued from profiles#show.

    Query the number of requests to this controller...

    ...and you see a massive spike.

    This spike marries up to the spike in job queue latency.

    Conclusion?

    Requests to profiles#show are causing the queue to be flooded.

    Fix production bugs quickly (2) images 167
  • 5

    Group by IP

    Filter requests by profiles#show.

    Group by user agent.

    And 89% are Firefox version 111.

    Strange...

    ...let's group by IP address.

    Ah - all those requests are coming from the same IP.

    Conclusion? 

    The root cause is a scraper impersonating a real browser.

    Fix production bugs quickly (2) images 176
  • 6

    Block Scraper

    Go into Cloudflare...

    ...create an intelligent rule...

    ...to block IPs making more than 10 requests a second...

    ...save.

    Let's go back to Datadog.

  • 7

    Validate Fix

    The graphs almost instantly change.

    The volume of requests drops like a stone.

    Back to normal levels.

    You look at the queue latency...

    ...yup, coming back under 5 minutes.

    Great. All fixed.

    Fix production bugs quickly (2) images 181
  • 8

    Quick, Easy, Fun, Fixed

    Total time? 15 minutes.

    Minimal interruption to feature work.

    No deployment of code.

    No extra tickets.

    No customer impact.

    One Cloudflare rule.

    Tweak and watch the results on the app in real time.

    This is the power of observability.

    Emojis for success

Frequently Asked Questions

Anxious about why this might not work for you or any of the details? Here are answers to questions you may have.

Meet Your Instructor

  • John Gallagher Portrait - Joyful Observability Coach
    John Gallagher
    Joyful Observability Coach

    I've been working with Ruby and Rails since Rails 2 and have experienced the pain of poor observability.

    I'm passionate about helping every Rails engineer understand their app in production.

    I'll not list my hobbies because I know you don't care.

Contact Me

Have more questions? Get in touch with me.

Error. Your form has not been submittedEmoji
This is what the server says:
There must be an @ at the beginning.
I will retry
Reply
Built on Unicorn Platform