GraphRAG Cross-Investigation Intelligence

Overview#

An intelligence analyst working across dozens of simultaneous investigations cannot easily answer questions like "which entities appear in multiple unrelated cases?" or "what connects this suspect to the financial network flagged in a separate operation?" Each investigation is a silo, and connecting the dots between them requires reading every case individually. GraphRAG closes that gap by building a community-level knowledge graph across all investigations in an organisation and using an LLM to answer cross-investigation questions with full provenance.

The GraphRAG Cross-Investigation Intelligence module detects entity communities in the organisation knowledge graph using the Louvain algorithm, generates a concise LLM summary for each community, and then answers analyst questions using two query patterns. Global search uses a Map-Reduce approach: each community summary is scored for relevance to the question, and the top-scoring summaries are synthesised into a final answer. Local search anchors the query to a set of specific entities, loads their 2-hop subgraph from the graph relationships table, identifies which communities overlap with those entities, and asks the LLM to answer using the combined subgraph and community context. Community summaries are cached for 24 hours per organisation; the analyst or a platform administrator can trigger a rebuild on demand.

Last Reviewed: 2026-04-14 Last Updated: 2026-04-14

Key Features#

Louvain Community Detection: The Argus knowledge graph for each organisation is built from the graph relationships store, which stores all entity relationships as a platform record store-first source of truth. NetworkX reads these edges and runs the Louvain algorithm (the Louvain partitioning function, seeded for determinism) on the undirected projection of the directed graph. Each resulting community represents a cluster of entities that are more densely connected to each other than to the rest of the graph.
LLM Community Summaries: For each community with two or more entities, Claude generates a 2-3 sentence summary covering which entity types form the cluster, which relationship types connect them, and what investigative significance the cluster represents. Summaries are stored in the dedicated community intelligence cache with a 24-hour TTL and are strictly scoped to organisation scope.
Global Search (Map-Reduce): Analyst questions that require organisation-wide reasoning are answered using the Map-Reduce pattern from Edge et al. (2024). In the map phase, each community summary is scored for relevance to the question using Claude (0.0-1.0). The top-scoring communities are passed to the reduce phase, where Claude synthesises a final answer with a confidence score. The response includes the full list of contributing communities and their relevance scores for analyst review.
Local Search (Subgraph + Context): For questions anchored to specific entities, the local search mode loads the 2-hop subgraph from the graph relationships store (all edges within two hops of the seed entities, strictly org-scoped), identifies which cached communities overlap with the subgraph, and presents Claude with both the raw graph edges and the relevant community summaries. This grounds the answer in specific relationship-level evidence rather than solely in community-level abstractions.
24-Hour Community Index Caching: Community detection and LLM summarisation are computationally expensive for large graphs. The module caches the community index in the dedicated community intelligence cache (platform record store) and marks the build as complete in the platform distributed cache with a 24-hour TTL. Subsequent queries within the TTL window read from the cache without re-running Louvain or LLM summarisation. Administrators can force a rebuild at any time via the Build Index control.
Strict Organisation Scoping: Every database query in the GraphRAG pipeline includes organisation scope in the WHERE clause. The a dedicated community intelligence cache has an index on (organisation scope, expires_at). Community IDs are namespaced per organisation. Cross-tenant data leakage is structurally impossible.

Use Cases#

Cross-Investigation Entity Correlation: An analyst asks "which entities appear across multiple unrelated investigations?" The global search maps all community summaries for clusters that span multiple investigation contexts and synthesises a ranked answer identifying the highest-value cross-investigation nodes.
Suspect Network Mapping: An analyst investigating a financial fraud case asks "what connects suspect X to the financial network flagged in operation Y?" Local search loads the 2-hop subgraph around suspect X's entity ID, finds which communities overlap, and explains the connection paths using community and edge context.
Threat Actor Attribution: Analysts ask "do any entities in our current investigations match the profile of known threat actor group Z?" Global search maps community summaries for clusters with matching characteristics and returns a synthesised attribution assessment.
Investigation Triage: A duty analyst receiving a new tip asks "is this entity already known to our investigations?" Local search on the entity ID surfaces all graph connections within two hops and the communities those connections belong to, enabling rapid triage without manually searching each investigation.
Pattern of Life Analysis: An analyst asks "which organisations appear repeatedly in logistics-related investigations?" Global search identifies communities with high concentrations of logistics and organisation entity types and summarises the cross-investigation patterns.

Open Standards#

STIX 2.1 (OASIS): The entity and relationship types used in the knowledge graph align with the Structured Threat Intelligence eXpression object model, enabling interoperability with external threat intelligence feeds and tools.
W3C PROV-DM (Provenance Data Model): Answer provenance, including contributing community identifiers and relevance scores, follows the PROV-DM model for recording the origins and derivation of information products.
JSON Schema (IETF, draft-07 and later): Structured LLM responses for relevance scores, confidence values, and community summaries are validated against JSON Schema definitions to ensure consistent, machine-readable output.
OpenAPI 3.x (Linux Foundation / OpenAPI Initiative): All GraphRAG query and index management endpoints are described using OpenAPI 3.x specifications, enabling standard tooling for client generation, documentation, and API governance.
ISO/IEC 27001: Organisation-scoping of all graph data and community summaries, combined with strict access controls and audit logging, is designed to support compliance with the ISO/IEC 27001 information security management framework.

Integration#

Graph Relationships: Reads from the graph relationships store (platform record store), which is the source of truth for all entity relationships in the Argus platform. GraphRAG does not write to or modify the knowledge graph.
Claude Service: Uses the existing the platform LLM service for all LLM calls. The GraphRAG service passes structured prompts and parses structured JSON responses for relevance scores and confidence values.
StateClient: Uses the platform distributed state service for 24-hour cache invalidation, preventing duplicate rebuilds across service instances.
Investigation Workspace: A dedicated cross-investigation panel integrates into the investigation workspace, accessible from the cross-investigation intelligence toolbar.
Permission System: Community index rebuild requires the graph intelligence administration permission. Global and local search require the investigation read permission. All workflow handlers enforce organisation scoping via the current user's organisation scope.