Understand Your Rails App In Production

"That's weird... it shouldn't be doing that..."

Stop guessing and finally understand what your Rails app is actually doing in production.

✅ 6 Weeks Workshop
✅ Hands On And Practical
✅ No DevOps Experience Needed
✅ Less Than 4 Hours A Week
✅ Designed For Seniors And Leads
✅ Solo Or Team Workshops

Fix Bugs Faster

Observability lets you catch issues the moment they happen. It's like having X-ray vision for your app, so you can fix problems before they mess with your users' experience. No more guessing games—just quick, effective solutions.
Find Performance Bottlenecks

Observability helps you see where you can speed things up and make everything run smoother. Your users will notice the difference, and they'll keep coming back because they trust your app to perform.
Less Incidents, Less Stress

Observability gives you the power to set alerts that warn you about potential issues before they blow up. It's like having a crystal ball for your app's health, letting you stay ahead of the game and keep everything running smoothly.

The Problems

Diagnosing Incidents Takes Hours

Ever felt like you're chasing shadows? Debugging incidents without observability is just that. Imagine it's 2 AM - you're squinting at logs trying to figure out why your app just tanked. Not fun, right?

Observability transforms your bug hunt into a guided tour. No more guesswork, just straight answers. And yes, you can actually sleep peacefully at night.

App Health Is a Mystery

You're driving blindfolded. That's you managing a Rails app without observability. System health? A big question mark. CPU spikes, memory leaks, and you’re none the wiser until your app slows to a crawl—or worse, crashes.

Improving Performance Is A Wild Goose Chase

Performance bottlenecks are sneaky, and without the right tools, they're nearly invisible. Trying to connect a profiler to a Rails app in production? Good luck with that.

Even if you can figure out the slow areas of the app, without observability you can't accurately determine why they're slow.

Background Jobs Are A Black Box

Those background jobs you rely on? They can fail silently and spectacularly. Without observability, you won’t know until it’s too late. Emails undelivered, reports half-baked—chaos ensues.

Wasted Engineering Time Reinventing The Wheel

Rails is a "batteries included" framework, but that doesn't seem to apply to observability.

Logging has been a weak point in the ecosystem. Structured logging is coming in Rails 8, but none of the other observability best practices come out of the box.

Every team I've seen has to start from scratch - writing Sidekiq middleware, adding metrics, and solving a ton of low level fiddly details.

This is frustrating, wastes hours of engineering time and distracts from the bigger business goals. The end result is often low quality and missing crucial instrumentation.

Enroll To Save Hours (Solo - $150)

What Do You Get?

Tailored Strategy Blueprint

Over the 6 weeks you'll build a personalized, practical, step-by-step action plan designed for your observability needs.
Steps To Observable Software

Master observability in just 4 hours a week — no DevOps experience required! Learn my simple 5-step process to boost software insights with ease.
Rails Logging Showdown

Comparisons of the best Rails structured logging libraries. See how they compare to make an solid choice and avoid issues later.
Bug Prioritization Worksheet

Stop wasting time on the wrong bugs! Use this actionable worksheet to identify the most business-critical issue you want to focus on in the workshop.
No-Bankruptcy Blueprint

Maximize observability without breaking the bank. Follow this checklist to boost insights while keeping costs under control.
Demystifying 3 Pillars

Get clarity on observability’s key pillars with this one-page guide. Cut through the vendor BS and finally understand traces, logs, and metrics.
Query Like a Pro

Learn five powerful tips to supercharge your querying skills in any observability tool — become the data detective your team needs.
Slack Channel Access

Ask any observability questions you have in the group Slack. We're here to support each other.
Accountability Followups

Learning observability is fine, but if you don't take action, it's all pointless. I follow up with you every week to help you see real progress against the plan.
Real Time Collaboration

Work directly with me in real time. Solve problems faster, get immediate feedback and move as one. Whether pairing or mobbing, your team will level up their skills and ship quality instrumentation quicker.
Pull Requests

I’ll create Pull Requests asynchronously, so you can easily review them during our weekly calls. This ensures you're using our time together effectively. Access needed to your repos.
Deep Context

I’ll dive into your business problems and domain, applying all my knowledge of observability to your specific challenges. This ensures solutions we create are aligned with your business goals for maximum impact.

The Dream

What does better observability in your Rails app look like?

Quick Incident Resolutions

Observability transforms your bug hunt into a guided tour. With the right tools, you get straight answers instead of guesswork.

Real-time insights mean you can pinpoint issues immediately and resolve them faster. Say goodbye to sleepless nights and hello to efficient, stress-free debugging.

Complete Health Metrics at Your Fingertips

Observability removes the blindfold. Finally, your team can get a clear view of your app's health. Track background job queue latency, request errors, and other vital metrics in real time.

This proactive approach helps you catch potential issues before they become incidents and before customers even notice.

Answers Within Seconds

Metrics capture the what (e.g. background jobs are failing) but they don't give you the why (e.g. a spammer is DDOSing your site). This workshop embraces ideas from Observability 2.0 to help you capture vital context alongside the metrics.

End result? An app where you can ask any question you can think of and get an answer within seconds.

Effortless Performance Optimization

Observability makes performance tuning straightforward. With detailed metrics and tracing, you can identify and understand performance issues quickly. Find out exactly where the slowdowns are happening and why. This way, you can optimize effectively and ensure your app runs smoothly.

Reliable Monitoring of Background Jobs

Understand exactly what's going on with your background jobs. Monitor your jobs in real time to ensure they run smoothly. Get alerts the moment something goes wrong, so you can fix issues before they escalate.

No more guessing if your jobs are working as expected - know for sure.

Designed For Ruby On Rails

I've worked for years to instrument Rails apps. It's been a long journey but I now have a repeatable formula you can implement for the essentials of observability.

Once you have the basics in place I coach you beyond the basics into the domain of your Rails app in your company and we customise for your use case.

Reserve Your Place Today for $150

Pricing

Choose a plan that suits you.

Team

Personalised Support To Fix Bugs 20x Faster For Your Team.

$

2000

$

2000

Buy Team
- Team Coaching
- Weekly Live Sessions
  
  Designed around your team
- Personalised Action Plan
  
  Personalized roadmap to focus on areas most relevant to your app's needs.
- Full Workshop Recordings
- Downloadable Resources
- Slack Channel Access
- Priority Email Support
Solo

Learn How To Fix Bugs 20x Faster with Group Coaching.

$

150

$

150

Buy Solo
- Team Coaching
- Weekly Live Group Sessions
- Personalised Action Plan
- Early Access to Materials
- Everything In Standard

😭 Poor Observability

Your Rails app has poor observability. So what?

I've seen Rails apps where an incident looks a bit like this...

1

Customers Report Problems

It all starts on a Friday afternoon. Ping.

You get a Slack notification. You're on call this week.

Then another. And another. Seems like customers can't get their password reset emails.

Uh oh. You haven't worked on that part of the system.

No worries - that's why you use Datadog.
2

Everything Seems Fine

Your team doesn't find the logs useful.

Sure enough - they aren't useful this time either.

All requests to the app are successful.

Nothing concerning there.

There are a handful of poorly structured traces.

They show nothing of any concern.
3

Drop Everything To Fix Defect

The product owner needs this fixed immediately.

Time to context switch.

So you drop the current feature, which was due next sprint.

Make a ticket, move into in progress, create a branch.

You look over the password reset code for problems.

Check the configuration for the email provider.

Everything looks fine.

Pretty quickly, you're stuck.
4

Pressure Builds

The CEO is asking your product owner what's going on.

The support channel lights up like a Christmas tree.

Lots of customers are complaining.

They can't sign into the app and they can't reset their password.

They're locked out.

Many take to Twitter to complain.
5

Deploy Extra Logging

You and a senior colleague pair on the issue.

He agrees about the lack of data - you need more information.

He suggests adding some extra logs.

But... where?

You pair for 45 minutes and hack some logging into a Sidekiq middleware.

It takes 20 minutes until the logging is live.

15 minutes for the tests to run and another 5 for the app to deploy.

Whilst waiting for the app to deploy, the issue disappears.
6

Issue Resolves Itself

Password reset emails are working again.

Very weird.

The product owner spends 2 hours on the cleanup.

There's no way of knowing the true scale of the issue.

All they have is 34 Slack threads.

So they email out an apology to all 34 customers.

It's time consuming and embarrassing.
7

Fix It Later

There's a huge pressure to deliver features.

You put a ticket on the backlog to investigate the issue.

Everyone agrees - we'll fix it later.

This isn't an issue that's affecting customers any more.

The ticket is pushed back sprint after sprint.

After all, that was just a one time issue.

Probably won't happen again. Right? Right.
8

2 Months Later

Groundhog Day.

The same issue comes back.

There's one customer who was impacted by the first incident too.

They post on Facebook that they'll never use your app again.

The delete their account and ask for a refund.

Customers are annoyed. So is the CTO.

But remember - you added the extra logging!

Good job you did that.

Now you can diagnose the issue.
9

Diagnose with Extra Logging

You search in Datadog for "Email Failed".

This was the log message you added when email sending failed.

Nothing relevant comes back.

So... the emails aren't failing?

But why aren't they getting sent?

An hour has passed since the start of this second incident.

The CEO is not happy.

The product owner is not happy.

Customers are definitely not happy.

You feel utterly helpless. There's still not enough data.

The problems pass, you add more random logging, guessing what might be wrong.
10

Vicious Cycle

Every incident you can't fix hurts team morale.

The customers are more and more annoyed.

The root cause of the issue is never diagnosed.

More interruptions from planned feature work.

More wasted engineer time.

Product Owners are puzzled why the issue can't just be fixed.

Engineers become cynical, burned out.

Eventually they're sick of the chaos and find another job.

Ultimately, no-one wins.

Avoid These Issues - Enroll Today - $150

🤩 Great Observability

Here's the same root cause.

This time the Rails app has great observability.

1

Alert in Slack

It starts on a Thursday evening. Ping.

Queues in Sidekiq are set up with an SLA.

The queue that sends password reset emails has an SLA that all jobs will complete within 5 minutes.

This queue is taking up to 30 minutes for jobs to complete.

That's weird.

This started happening 7 minutes ago, so customers won't be complaining just yet.

You have time to fix the issue before anyone notices.
2

Check Out Queue Latency

The app code instruments queue latency for every job performed.

It also instruments dimensions such as queue and job class.

It was a lot of work to set that up, but it gives you lots of querying options in situations like this.

You query Datadog for the P95 latency of the within 5 minute queue.

Woah - it's rising rapidly.

Let's figure out which jobs are taking the longest.
3

Group By Job Class

In a couple of clicks you group job duration by job class.

There are a flood of jobs with the class "ElasticSearchIndexingJob".

Hmmm. Weird.

Too many jobs? Or are the jobs too slow?

Grouping the number of jobs in the queue by job class...

...shows that it's a volume issue.

What's enqueuing all these jobs?
4

Drill Into Requests

Every job that's performed is tagged with the controller and action that enqueued it.

Grouping by controller and action for the ElasticSearchIndexingJob...

...masses of jobs are enqueued from profiles#show.

Query the number of requests to this controller...

...and you see a massive spike.

This spike marries up to the spike in job queue latency.

Conclusion?

Requests to profiles#show are causing the queue to be flooded.
5

Group by IP

Filter requests by profiles#show.

Group by user agent.

And 89% are Firefox version 111.

Strange...

...let's group by IP address.

Ah - all those requests are coming from the same IP.

Conclusion?

The root cause is a scraper impersonating a real browser.
6

Block Scraper

Go into Cloudflare...

...create an intelligent rule...

...to block IPs making more than 10 requests a second...

...save.

Let's go back to Datadog.
7

Validate Fix

The graphs almost instantly change.

The volume of requests drops like a stone.

Back to normal levels.

You look at the queue latency...

...yup, coming back under 5 minutes.

Great. All fixed.
8

Quick, Easy, Fun, Fixed

Total time? 15 minutes.

Minimal interruption to feature work.

No deployment of code.

No extra tickets.

No customer impact.

One Cloudflare rule.

Tweak and watch the results on the app in real time.

This is the power of observability.

Improve Observability Of Your Rails App - $150

Frequently Asked Questions

Anxious about why this might not work for you or any of the details? Here are answers to questions you may have.

Enroll Today to Fix Bugs 20x Faster

Meet Your Instructor

John Gallagher

Joyful Observability Coach

I've been working with Ruby and Rails since Rails 2 and have experienced the pain of poor observability.

I'm passionate about helping every Rails engineer understand their app in production.

I'll not list my hobbies because I know you don't care.

Contact Me

Have more questions? Get in touch with me.

The Problems

Diagnosing Incidents Takes Hours

App Health Is a Mystery

Improving Performance Is A Wild Goose Chase

Background Jobs Are A Black Box

Wasted Engineering Time Reinventing The Wheel

What Do You Get?

Tailored Strategy Blueprint

Steps To Observable Software

Rails Logging Showdown

Bug Prioritization Worksheet

No-Bankruptcy Blueprint

Demystifying 3 Pillars

Query Like a Pro

Slack Channel Access

Accountability Followups

Real Time Collaboration

Pull Requests

Deep Context

The Dream

Quick Incident Resolutions

Complete Health Metrics at Your Fingertips

Answers Within Seconds

Effortless Performance Optimization

Reliable Monitoring of Background Jobs

Designed For Ruby On Rails

Pricing

Team

Solo

😭 Poor Observability

Customers Report Problems

Everything Seems Fine

Drop Everything To Fix Defect

Pressure Builds

Deploy Extra Logging

Issue Resolves Itself

Fix It Later

2 Months Later

Diagnose with Extra Logging

Vicious Cycle

🤩 Great Observability

Alert in Slack

Check Out Queue Latency

Group By Job Class

Drill Into Requests

Group by IP

Block Scraper

Validate Fix

Quick, Easy, Fun, Fixed

Frequently Asked Questions

Meet Your Instructor

Contact Me

Thank You!