Postmortems
Why Write a Postmortem
To understand how failures happen, in order to prevent future occurrences by education and process changes.
When To Write a Postmortem
A postmortem is expected for any tree closures lasting longer than 4 hours, within 72 hours of the outage.
Who Should Write the Postmortem
The postmortem should be written by someone involved with detecting and correcting the issue, preferably someone who can take responsibility for the followup.
What to Include
Please use the postmortem template found here (file -> make a copy).
Your postmortem should include the following sections:
- Title
- Summary of the event
- Full timeline
- Root cause(s)
- What worked and what didn't (a.k.a., lessons learned)
- Action items (followup bugs assigned to specific people)
Where to Put It
Whenever possible, postmortems should be accessible to the entire Chromium community. If you are a Google employee, and your postmortem contains internal details, see the internal infrastructure team's postmortem site instead.
- With your chromium.org, write it in a Google Doc, set sharing permissions to “Anyone who has the link can comment”
- Add it to the list below.
- Send the link to chromium-dev@chromium.org or infra-dev@chromium.org, as relevant.
See also:
Title | Date |
---|---|
SPDY/QUIC Connection Pooling Bug Postmortem | |
chromium.perf tests on android userdebug builds failing for 7 days | 2015-01-30 |
No data from some android bots on chromium.perf for 2 weeks | 2014-12-22 |
Grit Compile Errors Require Clobber | 2014-07-25 |
Swarming Postmortem: 2014-12-04 | 2014-12-04 |
.dbconfig files loss on master3 | 2014-09-19 |
Postmortem: 15 hour tree closure by a file named "about" | 2014-05-09 |
Swarming Postmortem: Undeleteable directorie | 2015-11-22 |