Analyst-Driven OSINT Collection

Overview#

Intelligence collection from open-source and structured threat feeds is most useful when it is purposeful, documented, and tied to an investigative question. Ad hoc scraping of the public web or dark web without a human in the loop produces noise, consumes resources, and creates legal exposure without producing proportionate analytical value. The Argus OSINT architecture starts from a different premise: every collection act begins with an analyst, and the system enforces this invariant at the database layer.

All Argus OSINT collection is analyst-initiated. An analyst must specify a target query, a collection scope, a trigger type, and a justification before collection begins. For structured threat-intelligence feeds, analysts configure explicit TAXII 2.1 subscriptions using the OASIS open standard; automated polling occurs only for feeds an analyst has deliberately opted into. There is no background crawler, no autonomous discovery loop, and no system-initiated collection path.

Diagram

graph LR
    A[Analyst Initiates Collection] --> B[OsintCollectionTask\nanalyst UUID required]
    B --> C{trigger_type}
    C -->|ANALYST_MANUAL| D[Ad-hoc Collection Executes]
    C -->|ANALYST_SCHEDULED| E[Scheduled Collection Executes]
    C -->|TAXII_FEED_SUBSCRIPTION| F[TAXII 2.1 Poll]
    D --> G[STIX 2.1 Objects]
    E --> G
    F --> G
    G --> H[Entity Ingestion Pipeline]
    H --> I[Intelligence Graph]
    J[Analyst Configures TAXII Feed] --> K[TaxiiFeedConfig\ncreated_by analyst UUID]
    K --> L[Scheduled Poll]
    L --> F

Last Reviewed: 2026-04-14 Last Updated: 2026-04-14

Key Features#

Analyst-Owned Collection Tasks: Every OSINT collection task carries a mandatory initiated_by UUID identifying the analyst who authorised collection. This is enforced at the PostgreSQL layer as a NOT NULL constraint; no collection record can exist without an owning analyst. The OsintCollectionTask model additionally captures a free-text justification field so analysts can document why a particular collection was necessary.
Explicit Trigger Types: Collection tasks declare one of three trigger types: ANALYST_MANUAL (analyst requested collection ad hoc), ANALYST_SCHEDULED (analyst configured a repeating schedule), or TAXII_FEED_SUBSCRIPTION (analyst explicitly subscribed to a structured feed). A database CHECK constraint prevents any other value, ensuring no silent autonomous trigger path can be introduced.
TAXII 2.1 Feed Subscriptions: Analysts can subscribe to structured threat-intelligence feeds from ISAC partners, government CERTs, and commercial threat-sharing communities using the OASIS TAXII 2.1 open standard. The TaxiiFeedConfig dataclass carries the created_by analyst UUID. Polling is automated on the configured interval, but the subscription itself is always an explicit analyst action, not autonomous system behaviour.
STIX 2.1 Intelligence Objects: Feed subscriptions and manual collection ingest STIX 2.1 objects (indicators, threat actors, attack patterns, malware, campaigns) from OASIS-compliant sources. STIX objects are normalised into the Argus entity and intelligence graph with provenance tracing back to the source collection task.
Collection Scope Classification: Tasks are tagged with a collection scope (OPEN_WEB, DARK_WEB, TAXII_FEED, or SPECIFIC_SOURCE), enabling downstream audit review to understand where information originated and apply appropriate handling rules.
Audit Trail Integration: Every collection task is recorded in the Argus audit trail with organization_id, initiated_by, trigger_type, target_query, and timestamps, satisfying EDF/PESCO compliance requirements for intelligence collection logging.

Open Standards#

Standard	Organisation	Role in Argus
TAXII 2.1	OASIS STIX/TAXII TC	Automated feed subscription and polling protocol
STIX 2.1	OASIS STIX/TAXII TC	Structured threat intelligence object format
MISP Sharing Standard	CIRCL / MISP Project	Open sharing protocol for MISP feed compatibility

All collection is analyst-initiated or analyst-configured. This architecture is distinct from autonomous continuous web collection systems: the initiated_by NOT NULL database constraint is the architectural boundary between the two approaches.

Use Cases#

Analyst-Driven OSINT Collection is used across defence intelligence, financial crime investigation, and national cybersecurity sectors.

Targeted Investigation: An analyst investigating a threat actor enters a target query and initiates a manual collection task across configured OSINT providers, with the justification documented in the task record.
Structured Feed Monitoring: An analyst subscribes to a government CERT TAXII 2.1 feed. The platform polls the feed on the configured interval and ingests new STIX indicators automatically, with the analyst's UUID attached to every resulting collection task.
Dark Web Monitoring: An analyst configures a dark web collection scope for a specific organisation or credential type, with justification recorded. Collection executes against configured dark web sources and results feed into the intelligence graph.
ISAC Participation: The platform consumes TAXII 2.1 feeds from sector ISACs, normalising STIX 2.1 indicators into actionable intelligence for the organisation's threat model.

Integration#

Intelligence Graph: STIX objects ingested from collection tasks are converted to graph nodes and edges in the Neo4j intelligence graph, maintaining provenance links to the source OsintCollectionTask.
Threat Intel Domain: Collection results flow into the threat intelligence pipeline for correlation and enrichment.
Alerting: New STIX indicators from TAXII feeds can trigger alerts against existing monitored entities.
MISP Integration: MISP sharing events are ingested via the MISP connector, which creates OsintCollectionTask records with trigger_type = TAXII_FEED_SUBSCRIPTION where the MISP feed was analyst-configured.

Module metadata

Overview#

Key Features#

Open Standards#

Use Cases#

Integration#