OBS Group Inc. - Degraded Performance for OBS Pay and OBS Pay APIs & Partial Outage for OBS SecuRA Runtime Environment and Encryption APIs – Incident details

Degraded Performance for OBS Pay and OBS Pay APIs & Partial Outage for OBS SecuRA Runtime Environment and Encryption APIs

Resolved
Degraded performance
Started 3 months agoLasted about 3 hours

Affected

OBS Pay

Degraded performance from 2:16 PM to 5:31 PM, Under maintenance from 5:31 PM to 5:34 PM

OBS Pay APIs

Partial outage from 2:16 PM to 5:31 PM, Under maintenance from 5:31 PM to 5:34 PM

OBS MIRD Research Cloud

Partial outage from 2:16 PM to 5:31 PM, Under maintenance from 5:31 PM to 5:34 PM

OBS SecuRA Runtime Environment

Degraded performance from 2:16 PM to 5:31 PM, Under maintenance from 5:31 PM to 5:34 PM

OBS SecuRA Encryption APIs

Partial outage from 2:16 PM to 5:31 PM, Under maintenance from 5:31 PM to 5:34 PM

Updates
  • Resolved
    Resolved

    Resolution of Issues Affecting OBS Pay, OBS Pay APIs, OBS SecuRA Runtime Environment, and Encryption APIs


    Summary of Resolution

    OBS Pay and OBS Pay APIs

    1. Issues Identified:

      • Degraded performance, including transaction delays and API timeouts.

    2. Resolution Steps:

      • Scaled up database and application server resources.

      • Applied temporary traffic throttling to stabilize performance.

      • Implemented enhanced monitoring to detect future spikes in real time.

    3. Status:

      • Services restored and stable as of 11:00 AM UTC.

    OBS SecuRA Runtime Environment and Encryption APIs

    1. Issues Identified:

      • Partial outage with approximately 40% of encryption and decryption requests failing.

    2. Resolution Steps:

      • Replaced and restarted the failed service node.

      • Rerouted traffic to healthy nodes.

      • Increased monitoring granularity for critical nodes.

    3. Status:

      • Full service restoration achieved at 1:00 PM UTC.


    Impact Summary

    1. OBS Pay and APIs:

      • Approximately 25,000 transactions delayed or failed globally during the incident.

      • No data loss or security impact.

    2. OBS SecuRA Services:

      • Affected approximately 40% of encryption requests for a subset of users.

      • No data compromise occurred; the issue was limited to availability.


    Next Steps

    1. Conduct a detailed post-incident review and publish findings by January 26, 2025.

    2. Enhance auto-scaling and failover mechanisms across all affected systems.

    3. Perform stress testing to ensure systems handle peak loads without service degradation.

    4. Roll out a robust communication plan to inform users about service improvements.


    Acknowledgment

    We sincerely apologize for the inconvenience caused during this incident and thank you for your patience and understanding as we worked towards a resolution.

    For any further concerns or support, please contact the Incident Response Team at incident_response@engineering.obsgroup.tech

  • Identified
    Identified

    Issues Identified in OBS Pay, OBS Pay APIs, OBS SecuRA Runtime Environment, and Encryption APIs


    Summary of Issues Identified

    OBS Pay and OBS Pay APIs

    1. Degraded Performance:

      • Significant delays in transaction processing.

      • API requests experiencing timeouts and intermittent failures.

    2. Root Cause Identified:

      • High database contention caused by an unexpected surge in transaction volume.

      • Inadequate auto-scaling thresholds for managing peak loads.

    OBS SecuRA Runtime Environment and Encryption APIs

    1. Partial Outage:

      • Approximately 40% of encryption and decryption requests failed.

      • Service node connectivity issues disrupted secure operations.

    2. Root Cause Identified:

      • Failure in a service node connecting to the central orchestration layer.

      • Insufficient failover readiness for the affected node.


    Current Status

    1. OBS Pay and APIs:

      • Mitigations applied, and performance has improved as of 11:00 AM UTC.

      • Monitoring continues to ensure stability.

    2. OBS SecuRA Services:

      • Partial restoration achieved by 10:30 AM UTC.

      • Full resolution is expected by 1:00 PM UTC.


    Next Steps

    1. Enhance auto-scaling and load-balancing mechanisms for OBS Pay systems.

    2. Implement improved failover and recovery mechanisms for OBS SecuRA nodes.

    3. Conduct a post-incident analysis to address root causes and prevent recurrence.


    We apologize for the inconvenience caused and appreciate your patience as we work towards a full resolution. For updates, please contact the Incident Response Team at incident_response@engineering.obsgroup.tech

  • Investigating
    Investigating

    Incident Report

    Date: January 24, 2025

    Time: 11:30 AM UTC

    Reported By: Incident Response Team

    Incident ID: IR-20250124-001

    ---

    Degraded Performance for OBS Pay and OBS Pay APIs & Partial Outage for OBS SecuRA Runtime Environment and Encryption APIs

    ---

    Incident Timeline

    Detection: January 24, 2025, 9:00 AM UTC

    First User Report: January 24, 2025, 9:15 AM UTC

    Mitigation Initiated: January 24, 2025, 9:30 AM UTC

    Partial Restoration: January 24, 2025, 10:30 AM UTC

    Full Resolution (Estimated): January 24, 2025, 1:00 PM UTC

    ---

    Affected Services

    1. Degraded Performance:

    OBS Pay

    OBS Pay APIs

    2. Partial Outage:

    OBS SecuRA Runtime Environment

    OBS SecuRA Encryption APIs

    ---

    Incident Description

    OBS Pay and OBS Pay APIs:

    Between 9:00 AM and 11:00 AM UTC, users experienced significant delays in payment processing and intermittent API failures. The latency for processing payments increased by over 67.3%, and some API requests timed out. Initial diagnostics pointed to high CPU utilization on database servers due to an unexpected spike in transaction volume.

    OBS SecuRA Runtime Environment and Encryption APIs:

    A partial outage was detected for the SecuRA Runtime Environment and associated encryption APIs. Approximately 40% of API requests failed due to a service node experiencing connectivity issues with the central orchestration layer. This impacted encryption and decryption processes critical to secure transactions.

    ---

    Impact Assessment

    1. OBS Pay and APIs:

    Users Affected: Estimated 25,000 transactions delayed or failed globally.

    Severity: Medium

    Financial Impact: Pending calculation based on transaction delays.

    2. OBS SecuRA Services:

    Scope: Approx. 40% of encryption requests affected for users relying on SecuRA APIs.

    Severity: High

    Security Impact: No evidence of data compromise; issue limited to service availability.

    ---

    Root Cause Analysis

    OBS Pay and APIs:

    Primary Cause: Increased transaction volume caused database contention, leading to slow query responses and API timeouts.

    Contributing Factors: Insufficient auto-scaling thresholds for peak load management.

    OBS SecuRA Services:

    Primary Cause: A failed service node caused connectivity disruptions with the orchestration layer.

    Contributing Factors: Lack of failover readiness for the affected node and delayed health-check responses.