Detect #7: 🧯Incidents at Docker, Bank of America ($), Playstation; 🚨 OTel Data Loss; 🛠️ Detecting Noisy Neighbors with eBPF; 🌩️ Remote Memory Profiling...
Community
November 5, 2024
An adventure through the vibrant world of problem detection, where every post is a mix of expert insights, community wisdom, and tips, designed to turbocharge your expertise.
Join the newsletter:
Thanks for joining our newsletter
Oops! Something went wrong while submitting the form.
Welcome to the latest edition of the only newsletter focused on the art and science of problem detection. Detect is brought to you by Prequel (prequel.dev), the team bringing detection engineering to reliability.
In this issue: OTel Data Loss; Detecting Noisy Neighbors with eBPF; Remote Memory Profiling; Incidents at Docker, Bank of America ($), Playstation ...
Hidden Bug of the Month 🐜 🚨 (Presented by Prequel)
The AWS cloudwatch OTel receiver has a sneaky bug related to log data handling that is getting attention due to data loss. When a log group is removed, the receiver panics and crashes. The OTel community is planning to address this issue in a future release. If you're using these components in your setup, you are strongly advised to monitor for this failure and upgrade when a fix is available. (First reported by Alex Burnett)
See howPrequel helps teams detect a wide range of failures, powered by global reliability intelligence.
Upcoming Events 🗓️
Boost Reliability with Problem Detection and Management (October 23rd):Join an upcoming webinar with Niall Murphy, co-author of Site Reliability Engineering: How Google Runs Production Systems. Explore barriers to problem management and how emerging problem detection and analysis techniques can help overcome them. Register here. 👈🏼
SREcon EMEA 2024 (29–31 October): Join fellow SREs in Dublin for SREcon24 EMEA where industry leaders will discuss the latest in reliability engineering and incident management. (usenix.org)
And now, here's a digest of what happened last month in the world of problem detection.
Real Problem Detection & Troubleshooting Stories 📖
Sharpen your technical skills with these deep dives:
Detect Noisy Neighbors with eBPF: Netflix engineers used eBPF to uncover noisy neighbors that are hogging resources.(Netflix)
Kubernetes Lessons: A walk through observability lessons learned in Kubernetes land.(DZone)
Debugging Philosophy: Musing on the art of debugging—it’s as much mindset as method. (CatSkull)
Anti-Debugging Tactics: A deep dive into anti-debugging and how to detect fork-based tricks.(Tony Gorez)
TCPdump Saves the Day: Using TCPdump to solve an IPv6 bug.(Checkly)
Tools 🛠️
Stay sharp with the latest tools:
Remote Memory Profiling: Memray for remote profiling - the unsung hero you didn’t know you needed? (Textual)
No More Action Items in Incident Reviews?: Lorin Hochstein argues Incident reviews need less focus on action items, more on analysis.(Surfing Complexity)
Don't Provide Incident Resolution Estimates?: Robert Ross suggests everyone stop giving incident resolution estimates to customers.(FireHydrant)
Anatomy of Jank: Dive deep into Chromium’s perspective on what defines it.(Chromium)
Who Handles Monitoring?: A reflection on the team design options and tradeoffs.(SREPath)
Lessons from Sev0: Insights from a brand-new conference dedicated to incidents. (Amin Astaneh, CertoModo)
Architecturing for Reliability 📐👷♀️
Explore how other teams are architecting their applications to reach new heights:
Saving Compute at Cloudflare: The Pingora team at Cloudflare reduces compute use by 1%. (CloudFlare)
Cache Me Not: Cache decisions can be costly—here’s when to avoid them. (hazelweakly)
Netflix’s Pushy WebSockets: Optimizing WebSocket connections at scale (InfoQ)
Payments Engineering Pitfalls: The Payments Engineer Playbook explains why certain structures cause trouble. (Alvaro Duran)
Indexes Under the Hood: A technical deep dive into how indexes function.(dzone)
Duolingo's High-Scale Notification System: Learn how Duolingo manages millions of notifications.(infoq)
Whether you're on call, looking to optimize performance, or simply keeping up with the latest tips & tricks, we’re happy to be a part of your day.
Follow our brand new account on X (fka twitter): @detect_sh
Did someone forward you this email? Join our mailing list so you'll be the first to know.