Incidence Procedure
Seal prioritizes proactive monitoring and swift response to ensure continuous, secure operations. To maintain high standards of service availability and resilience, a robust array of monitoring channels designed to detect potential security incidents, including downtime, service disruptions, and unusual system activities.
This page briefly outlines the Incidence Procedure for responding to, documenting, and resolving software downtime incidents. It does not delve into comprehensive details.
Detection
Seal’s commitment to 24/7 monitoring enables Seal to provide a seamless user experience and uphold the security and integrity of the platform.
Seal maintains a variety of channels to monitor security incidents such as downtime or service disruptions.
Automated alerts are configured to trigger in response to specific thresholds. These alerts are customisable to align with Seal’s risk management criteria, addressing a range of scenarios, from routine service inconsistencies to critical security threats.
Regular system audits and validation checks are conducted on all monitoring and alerting protocols. These tests ensure that each channel’s sensitivity, accuracy, and responsiveness remain effective over time. Additionally, incidents are simulated on a periodic basis, to assess alert accuracy and improve incident response times. These simulations allow the Seal Team to practice real-time responses.
Communications
Effective communication is crucial for rapid response, clear information flow, and effective resolution.
Communication during an Incidence procedure involves multiple stages, starting from the first responder’s actions, team notifications, the creation of dedicated communication channels, and continuous status updates. Each step is designed to ensure that all relevant stakeholders, both internal and external, remain informed and coordinated throughout the incident lifecycle.
The various communication channels ensure that that Seal’s response teams can quickly share critical information, streamline decision-making, and maintain a transparent, coordinated approach to incident resolution.
Assessment and Debugging
The first responder, in collaboration with relevant engineers and team leads, evaluates the incident based on the following criteria (but not limited to):
User impact: Determine the scope of affected users or services - higher severity is assigned if a significant portion of users or core functions is impacted.
Service dependency: Assess which critical systems or components are affected and any downstream dependencies. Issues affecting core services, such as database security, are prioritised.
Recovery time: Consider the estimated time needed to resolve the issue.
Once severity is assessed, the technical team begins systematic debugging. This may include multiple steps, such as (but not limited to):
Information gathering: The team retrieves relevant logs, metrics, and alerts from monitoring systems to identify anomalies or error patterns.
Reproducibility: Where possible, the team attempts to replicate the issue in a controlled environment, to understand the conditions triggering the incident.
Identification of root cause
All information and findings are documented in the Incidence Report.
Resolution
Once the issue has been clearly identified, the Seal team works to implement a resolution, to not only restore functionality but also to mitigate future occurrences. Once the fix is implemented, the team performs thorough testing to confirm that the issue is resolved, such as end-to-end testing.
Relevant documentation, tests, and monitoring channels are updated to reflect findings and learning from the incident.
Last updated