{"id":"causal-discovery-intelligence","slug":"causal-discovery-intelligence","title":"Causal Discovery for Intelligence Analysis","description":"Correlation analysis tells an analyst that two event types tend to occur together. Causal structure discovery tells them which one tends to produce the other. In intelligence work that distinction matters: knowing that c","category":"intelligence","tags":["intelligence","geospatial"],"lastModified":"2026-04-14","source_ref":"content/modules/causal-discovery-intelligence.md","url":"/developers/causal-discovery-intelligence","htmlPath":"/developers/causal-discovery-intelligence","jsonPath":"/api/docs/modules/causal-discovery-intelligence","markdownPath":"/api/docs/modules/causal-discovery-intelligence?format=markdown","checksum":"217b006ecc2d9e20214d82b78c8475f168b199246f01ccd167ec7705b1ad18fb","headings":[{"id":"overview","text":"Overview","level":2},{"id":"key-features","text":"Key Features","level":2},{"id":"use-cases","text":"Use Cases","level":2},{"id":"integration","text":"Integration","level":2},{"id":"open-standards","text":"Open Standards","level":2}],"markdown":"# Causal Discovery for Intelligence Analysis\n\n## Overview\n\nCorrelation analysis tells an analyst that two event types tend to occur together. Causal structure discovery tells them which one tends to produce the other. In intelligence work that distinction matters: knowing that credential theft and lateral movement co-occur is useful, but knowing that credential theft is a consistent precursor to lateral movement rather than a consequence of it changes how analysts prioritise detection, response, and pre-emption. Causal structure is the layer of understanding that elevates pattern recognition into actionable operational intelligence.\n\nThe Causal Discovery module uses constraint-based and score-based algorithms to learn directed acyclic graphs (DAGs) from observational event co-occurrence data within investigations. The PC algorithm (Peter-Clark, Spirtes et al. 1993) identifies conditional independence relationships between event types and constructs a partially directed graph, while GES (Greedy Equivalence Search, Chickering 2002) uses a score-based approach. Both algorithms are implemented via the causal-learn library (Zheng et al. 2024, JMLR). The resulting graph reveals which event types tend to cause others, which tend to be caused, and which relationships remain statistically uncertain, requiring analyst review before being presented as findings.\n\n```mermaid\ngraph LR\n    A[Event Co-occurrence Matrix] --> B[PC / GES Algorithm]\n    B --> C[DAG Structure]\n    C --> D[Certain Edges]\n    C --> E[Uncertain Edges]\n    D --> F[Causal Graph Display]\n    E --> G[HITL Review]\n    G --> F\n```\n\n**Last Reviewed:** 2026-04-14\n**Last Updated:** 2026-04-14\n\n## Key Features\n\n- **PC Algorithm with Fisher's Z Test**: The constraint-based PC algorithm uses Fisher's Z conditional independence test at a 0.05 significance level to determine which event type pairs are causally linked versus conditionally independent. The resulting graph encodes directed causal relationships where statistical evidence supports direction, and undirected edges where direction cannot be determined from the data.\n\n- **GES Score-Based Discovery**: The Greedy Equivalence Search algorithm uses a Bayesian Information Criterion-based score to search over the space of DAG equivalence classes, offering an alternative to constraint-based methods for datasets with different statistical properties.\n\n- **Edge Certainty Classification**: Each edge in the output graph carries a p-value from the underlying statistical test. Edges with p at or below 0.05 are classified as certain and displayed as solid arrows. Edges with p above 0.05 are classified as uncertain and displayed as dashed arrows, with a HITL badge indicating they are pending analyst review.\n\n- **HITL Integration for Uncertain Edges**: Uncertain edges are automatically submitted to the human-in-the-loop review queue before appearing in analyst-facing outputs. Edges with very high baseline confidence (surrogate confidence above 95 percent) are auto-approved. This ensures that the causal graph presented to analysts contains only edges with adequate statistical backing or explicit human endorsement.\n\n- **Graceful Correlation Fallback**: If the causal-learn library is not available in the deployment environment or if the dataset is too small for reliable independence testing, the module automatically falls back to a Pearson correlation-based undirected graph. This ensures analysts always receive a structure estimate rather than an empty result, with the method provenance clearly recorded.\n\n- **Degree Centrality**: Each node in the output graph carries in-degree and out-degree counts, identifying which event types are primarily causal (high out-degree), primarily consequential (high in-degree), or intermediary in the discovered structure.\n\n## Use Cases\n\n- **Attack Chain Reconstruction**: Discover which alert types and event categories tend to precede or follow each other within an investigation, building an evidence-grounded causal narrative of how an operation unfolded.\n- **Pre-emption Prioritisation**: Identify which observable event types are consistent causal precursors to high-severity incidents, enabling analysts to focus collection and monitoring on upstream indicators rather than downstream consequences.\n- **Indicator Relationship Mapping**: Map the causal relationships between network indicators, behavioural events, and geospatial triggers across an investigation, distinguishing correlated coincidence from causal sequence.\n- **Threat Pattern Validation**: Validate analyst hypotheses about causal relationships by comparing proposed causal chains against the statistically discovered DAG structure from observational event data.\n\n## Integration\n\n- **Investigation Engine**: Causal graphs are computed per-investigation and stored with the investigation record, providing a persistent structural model of event causality for each case.\n- **HITL Review Queue**: Uncertain causal edges enter the analyst review queue for expert judgement before being surfaced in investigation findings.\n- **Evidence Chain**: Discovered causal relationships contribute to the evidence chain record for an investigation, providing a documented provenance trail for causal claims.\n- **Alert Triage**: Causal structure from past investigations informs alert triage priority by identifying which upstream indicators are most predictive of high-severity sequences.\n\n## Open Standards\n\n- Spirtes, P., Glymour, C., and Scheines, R. (1993). Causation, Prediction, and Search. MIT Press. PC algorithm (Peter-Clark), no patent encumbrance.\n- Zheng, X., Aragam, B., Ravikumar, P., and Xing, E.P. (2018). DAGs with NO TEARS: Continuous Optimization for Structure Learning. NeurIPS 2018. arXiv:1803.01422.\n- Chickering, D.M. (2002). Optimal Structure Identification With Greedy Search. JMLR 3(3), 507-554. GES algorithm.\n- Zheng, Y. et al. (2024). Causal-Learn: Causal Discovery in Python. JMLR 25(60), 1-8. MIT license. https://github.com/py-why/causal-learn\n"}