Your Cart

$ 0.00 USD
  • :
Remove
Subtotal
Pay with browser.
Continue to Checkout
No items found.
Subscribe now!
Product is not available in this quantity.
Account
The Open Problem Detection (and Resolution) Community

Search for articles


Newsletter

Detect #13: OpenAI Error Rates, Lies Programmers Believe About Memory, SREcon ....

We dig into recent failure patterns from OpenAI's 20% error spike, unpack lessons on debugging career-impacting outages, and challenge persistent myths developers hold about memory usage.
Detect #13: OpenAI Error Rates, Lies Programmers Believe About Memory, SREcon  ....
Tips

Putting a Meaningful Dent in Your Error Backlog (By Dan Slimmon)

Resources

What is Problem Detection?

Newsletter

Detect #6: 🚨 Kafka data loss; 🛠️ Real debugging stories;  🌩️ All things ebpf; 🧯 Incidents at OpenAI, Anthropic, Hubspot, Google...

Lastest articles

Detect #12:  Outages at Slack, Cloudflare, Playstation, Go Profiling Tricks, Understanding Kubernetes Evictions
Newsletter

Detect #12: Outages at Slack, Cloudflare, Playstation, Go Profiling Tricks, Understanding Kubernetes Evictions

Heading-scratching issues and Major incidents..in detect #12.

Community
Community

March 4, 2025
Detect #11: Capital One and GitHub outages, new profiling tools, and the incident severity debate
Newsletter

Detect #11: Capital One and GitHub outages, new profiling tools, and the incident severity debate

DeepSeek arrives. OpenAI, Github, Capital One outages - and more in detect 11.

Community
Community

February 4, 2025
Detect #10: OpenAI and Canva outages, Kubernetes failures, debugging Rust and more...

Detect #10: OpenAI and Canva outages, Kubernetes failures, debugging Rust and more...

OpenAI and Canva outages, Kubernetes failures, debugging Rust and more...in detect #10


January 9, 2025
Detect #9: Reddit Outage, Debugging in Go, Lessons from Kubernetes Problem Detection, and Zero-Downtime Migrations ...
Newsletter

Detect #9: Reddit Outage, Debugging in Go, Lessons from Kubernetes Problem Detection, and Zero-Downtime Migrations ...

Reddit outage analysis, Go debugging, Kubernetes lessons, problem detection strategies, and zero-downtime migrations—discover insights in Detect #9!

Community
Community

December 19, 2024
Detect #8: Troubleshooting at Netflix; Incidents at Google and Mailchimp; Debugging Go; Lessons from Early YouTube SRE; Adidas' Platform Engineering Journey...
Newsletter

Detect #8: Troubleshooting at Netflix; Incidents at Google and Mailchimp; Debugging Go; Lessons from Early YouTube SRE; Adidas' Platform Engineering Journey...

Community
Community

November 14, 2024
Detect #7: 🧯Incidents at Docker, Bank of America ($), Playstation; 🚨 OTel Data Loss; 🛠️ Detecting Noisy Neighbors with eBPF;  🌩️ Remote Memory Profiling...
Newsletter

Detect #7: 🧯Incidents at Docker, Bank of America ($), Playstation; 🚨 OTel Data Loss; 🛠️ Detecting Noisy Neighbors with eBPF;  🌩️ Remote Memory Profiling...

In this issue: OTel Data Loss; Detecting Noisy Neighbors with eBPF; Remote Memory Profiling; Incidents at Docker, Bank of America ($), Playstation ... 

Community
Community

October 3, 2024
Detect #6: 🚨 Kafka data loss; 🛠️ Real debugging stories;  🌩️ All things ebpf; 🧯 Incidents at OpenAI, Anthropic, Hubspot, Google...
Newsletter

Detect #6: 🚨 Kafka data loss; 🛠️ Real debugging stories;  🌩️ All things ebpf; 🧯 Incidents at OpenAI, Anthropic, Hubspot, Google...

Welcome to the latest edition of the only newsletter focused on the art and science of problem detection. Featuring real problem detection & troubleshooting stories, best practices, notable incidents, and architecture tips.

Community
Community

September 3, 2024
Putting a Meaningful Dent in Your Error Backlog (By Dan Slimmon)
Tips

Putting a Meaningful Dent in Your Error Backlog (By Dan Slimmon)

“Let’s track our production errors,” they said. “We’ll harvest insights,” they said. And 3 years later, all we have to show for it is an error tracking dashboard so bloated with junk that it makes us sick to look at.

Community
Community

August 23, 2024
Detect Newsletter #5: 🚨 Hidden Bug of the Month;  🧯Root Cause of Crowdstrike's $5B outage; 🌩️ Incidents at Cloudflare and Github; 🛠️ Memory profiling, kafka monitoring and more
Newsletter

Detect Newsletter #5: 🚨 Hidden Bug of the Month; 🧯Root Cause of Crowdstrike's $5B outage; 🌩️ Incidents at Cloudflare and Github; 🛠️ Memory profiling, kafka monitoring and more

Explore the latest edition of the Detect Problem Detection Newsletter, where we uncover the hidden bug of the month, delve into the root cause behind CrowdStrike’s $5B outage, and analyze incidents at Cloudflare and GitHub. Plus, gain insights into memory profiling, Kafka monitoring, and more cutting-edge topics in software engineering. Stay ahead in your field with our expert analysis and actionable tips

Community
Community

August 9, 2024
June 2024: 🔥 Virtual Meetup with Xata 🚨 Incidents at Microsoft, Apple, Github; 🛠️ Learn How to Survive Massive Traffic Surges...and more
Newsletter

June 2024: 🔥 Virtual Meetup with Xata 🚨 Incidents at Microsoft, Apple, Github; 🛠️ Learn How to Survive Massive Traffic Surges...and more

Welcome to the latest edition of the only newsletter focused on the art and science of problem detection and troubleshooting. In this issue: 🔥 Virtual Meetup Tomorrow (6/3) 🚨 Incidents at Microsoft, Apple, Github; 🛠️ Learn How to Survive Massive Traffic Surges ....and more. 

Community
Community

June 5, 2024
July 2024: 🔥 Surviving on-call with an ex-Meta SRE;  🚨 Incidents at Google, Cloudflare, Github, and OpenAI; 🛠️ Get a handle on Flaky alerts...and more (Clone)
Newsletter

July 2024: 🔥 Surviving on-call with an ex-Meta SRE;  🚨 Incidents at Google, Cloudflare, Github, and OpenAI; 🛠️ Get a handle on Flaky alerts...and more (Clone)

Welcome to the latest edition of the only newsletter focused on the art and science of problem detection and troubleshooting.

Community
Community

July 3, 2024
Deterministic vs. Probabilistic Problem Detection
Guides

Deterministic vs. Probabilistic Problem Detection

Explore deterministic vs probabilistic problem detection in SRE. Learn how these methods impact SLOs and system reliability. Perfect for Site Reliability Engineers.

Community
Community

May 28, 2024
Stop Wasting Everyone's Time. Step Up Your Operational Review Meetings With Problem Detection
Tips

Stop Wasting Everyone's Time. Step Up Your Operational Review Meetings With Problem Detection

Operational review meetings play an important role in Site Reliability Engineering. However, the effectiveness of these meetings across organizations varies significantly.

Community
Community

May 28, 2024
May 2024: New debugging tools, our first live community event, behind the scenes at cloudflare, and more 🛠️🚨
Newsletter

May 2024: New debugging tools, our first live community event, behind the scenes at cloudflare, and more 🛠️🚨

In this issue: our first live community event, Cloudflare's approach to autonomous diagnostics, notable incidents at Braze and Honeycomb, new Kubernetes & NodeJS debugging tools ....and more.

Community
Community

May 5, 2024
April 2024: The First Newsletter Dedicated to Problem Detection & Troubleshooting 🛠️
Newsletter

April 2024: The First Newsletter Dedicated to Problem Detection & Troubleshooting 🛠️

In this issue: major hacking plot uncovered by performance analysis, SREcon 2024 americas recap, and notable incidents at notion and cloudflare, a linux tool round up.

Community
Community

April 5, 2024
What is Problem Detection?
Resources

What is Problem Detection?

A primer on the role of problem detection in modern software applications

Community
Community

March 18, 2024
How to Assess Your Problem Detection Approach:  The Detect Maturity Model (DMM)
Guides

How to Assess Your Problem Detection Approach: The Detect Maturity Model (DMM)

The Detect Maturity Model (DMM) emerges as a structured framework aimed at helping engineering teams improve their problem detection capabilities.

Community
Community

March 30, 2024
10 Problem Detection Pitfalls to Avoid
Tips

10 Problem Detection Pitfalls to Avoid

Uncover the top 10 pitfalls hindering problem detection progress in site reliability engineering (SRE), an area where precision can prevent costly incidents and engineer burnout. This post dives into common traps and showcases how leveraging a community like detect.sh can transform problem detection from a challenge into a strength, enhancing efficiency and innovation.

Community
Community

March 18, 2024

An adventure through the vibrant world of problem detection, where every post is a mix of expert insights, community wisdom, and tips, designed to turbocharge your expertise.

‍Join the newsletter:


Thanks for joining our newsletter
Oops! Something went wrong while submitting the form.
Newsletter
Tips
Resources
Guides
Tips

Stop Wasting Everyone's Time. Step Up Your Operational Review Meetings With Problem Detection

Stop Wasting Everyone's Time. Step Up Your Operational Review Meetings With Problem Detection

Popular articles

How to Assess Your Problem Detection Approach:  The Detect Maturity Model (DMM)

How to Assess Your Problem Detection Approach: The Detect Maturity Model (DMM)

10 Problem Detection Pitfalls to Avoid

10 Problem Detection Pitfalls to Avoid

Subscribe to the Detect Newsletter

Stay informed. Each issue is packed with news, insight, and resources from the Detect community.


Thanks for joining our newsletter
Oops! Something went wrong while submitting the form.

Supported by Prequel.dev