Overview#
When a threat intelligence team discovers that their overnight MISP synchronisation job silently failed three days ago, every indicator loaded since that point is potentially stale. By the time anyone notices, analysts have been working from outdated data without knowing it. Apache Airflow addresses this class of problem: it defines data workflows as code, enforces execution schedules, retries failures, and raises alerts when something goes wrong. Each Directed Acyclic Graph (DAG) is a Python script that specifies tasks, dependencies, retry policies, and success criteria in a form that is version-controlled and auditable.
Argus integrates with Airflow to synchronise DAG inventory and execution status, giving operators and engineers centralised visibility over the data pipelines that feed intelligence into and out of the platform. That includes threat feed ingestion schedules, STIX bundle generation jobs, report distribution tasks, and cross-system data synchronisation workflows. For intelligence agencies, financial crime units, and critical infrastructure operators, knowing that a pipeline is running on schedule is as important as the data the pipeline delivers.
Key Features#
DAG Inventory Synchronisation#
Sync individual DAG records from an Airflow instance via syncAirflowDag. Each persisted record captures: DAG ID, DAG name, schedule interval, current status, last run, next run, and aggregate success/failure counts, all scoped to organisation and clearance level.
DAG Status Monitoring#
DAG status tracks whether a pipeline is actively scheduled (active), administratively paused (paused), or has entered a failure state (failed). The GraphQL surface exposes current state alongside aggregate success and failure counters, letting operators identify unhealthy pipelines without logging into the Airflow UI directly.
Execution History#
The airflowDags surface exposes run summary fields including last run, next run, success count, failure count, and current status for each synced DAG, giving a quick health picture without needing per-run drill-down.
Schedule Transparency#
The schedule interval (cron expression or Airflow preset) is captured for each DAG. This makes it possible to verify that critical ingestion pipelines run at the expected frequency. A threat feed scheduled hourly but accidentally changed to weekly is detectable immediately, before it affects operational readiness.
Operational Event Emission#
DAG sync uses the platform operational event emitter to surface relevant state changes as operational events in the Argus event stream. A DAG transitioning from active to failed emits an operational alert, consistent with how other Argus integrations surface infrastructure health degradation.
Tag-Based Organisation#
airflowDags supports status-based filtering, allowing operators to focus on active, paused, or failed workflows relevant to a specific mission area or programme.
Use Cases#
- Threat Feed Freshness Monitoring: MISP synchronisation, STIX TAXII polling, and MWDB ingestion all run as Airflow DAGs. Argus tracks their execution status. If the overnight MISP sync DAG failed, analysts know before morning briefing that their indicator database may not be current.
- Report Distribution Pipeline Oversight: Automated intelligence report generation and distribution runs as Airflow DAGs. Argus tracks delivery pipeline health, confirming that scheduled reports reached their distribution lists on time.
- Cross-System Synchronisation Health: Argus data synchronisation jobs, including pushing enriched incidents to partner SIEMs, syncing case data to TheHive, and pushing STIX to TAXII servers, all run as Airflow DAGs. The integration gives operators a consolidated view of which cross-system sync jobs are healthy.
- Exercise Preparation Automation: Before cyber exercises, preparation pipelines covering synthetic intelligence data loading, exercise account provisioning, and forensic artefact pre-positioning run as Airflow DAGs. Argus tracks pipeline completion as part of go/no-go readiness checks.
Integration#
Available via GraphQL: airflowDags, airflowStats (queries); syncAirflowDag (mutation). All operations require authentication and organisation scoping. Data is stored in PostgreSQL under organisation and clearance-level constraints.
Works alongside MISP (ingestion DAGs), STIX/TAXII (export DAGs), TheHive (case sync DAGs), SCIM Provisioning (identity lifecycle DAGs), and Sigma Rules (SIEM rule distribution DAGs). Airflow is the orchestration layer; Argus is the monitoring and visibility layer.
Open Standards#
- Apache Airflow REST API (v1): DAG metadata, scheduling state, and run statistics are retrieved by calling the Airflow stable REST API (
/api/v1/dags/{dag_id}), making the integration portable to any compliant Airflow deployment. - GraphQL (June 2018 specification): All consumer-facing operations (
airflowDags,airflowStats,syncAirflowDag) are exposed as typed GraphQL queries and mutations, consistent with the platform-wide API surface. - OAuth 2.0 Bearer Token (RFC 6750): When an API token is supplied, it is transmitted as an
Authorization: Bearerheader to the Airflow REST endpoint, following the standard Bearer Token usage profile. - Cron Expression Syntax (IEEE Std 1003.1 / POSIX): The schedule interval captured for each DAG is stored and surfaced as a cron expression or Airflow preset string, enabling human-readable schedule verification against the POSIX cron specification.
- STIX 2.1 / TAXII 2.1 (OASIS): STIX bundle generation and TAXII polling jobs are among the DAG types tracked; the integration provides pipeline health visibility for these threat-intelligence interchange workflows.
- JSON (RFC 8259): All data exchanged with the Airflow REST API is in JSON; the client parses
application/jsonresponses and the GraphQL layer serialises DAG records as JSON for API consumers. - SQL (ISO/IEC 9075): DAG inventory and execution-state records are persisted and queried in PostgreSQL using standard SQL with named parameters and organisation-scoped
WHEREclauses.
Last Reviewed: 2026-03-18 Last Updated: 2026-04-14