# Analyst-Driven OSINT Collection

## Overview

Intelligence collection from open-source and structured threat feeds is most useful when it is purposeful, documented, and tied to an investigative question. Ad hoc scraping of the public web or dark web without a human in the loop produces noise, consumes resources, and creates legal exposure without producing proportionate analytical value. The Argus OSINT architecture starts from a different premise: every collection act begins with an analyst, and the system enforces this invariant at the database layer.

All Argus OSINT collection is analyst-initiated. An analyst must specify a target query, a collection scope, a trigger type, and a justification before collection begins. For structured threat-intelligence feeds, analysts configure explicit TAXII 2.1 subscriptions using the OASIS open standard; automated polling occurs only for feeds an analyst has deliberately opted into. There is no background crawler, no autonomous discovery loop, and no system-initiated collection path.

```mermaid
graph LR
    A[Analyst Initiates Collection] --> B[OsintCollectionTask\nanalyst UUID required]
    B --> C{trigger_type}
    C -->|ANALYST_MANUAL| D[Ad-hoc Collection Executes]
    C -->|ANALYST_SCHEDULED| E[Scheduled Collection Executes]
    C -->|TAXII_FEED_SUBSCRIPTION| F[TAXII 2.1 Poll]
    D --> G[STIX 2.1 Objects]
    E --> G
    F --> G
    G --> H[Entity Ingestion Pipeline]
    H --> I[Intelligence Graph]
    J[Analyst Configures TAXII Feed] --> K[TaxiiFeedConfig\ncreated_by analyst UUID]
    K --> L[Scheduled Poll]
    L --> F
```

**Last Reviewed:** 2026-04-14
**Last Updated:** 2026-04-14

## Key Features

- **Analyst-Owned Collection Tasks**: Every OSINT collection task carries a mandatory `initiated_by` UUID identifying the analyst who authorised collection. This is enforced at the PostgreSQL layer as a `NOT NULL` constraint; no collection record can exist without an owning analyst. The `OsintCollectionTask` model additionally captures a free-text `justification` field so analysts can document why a particular collection was necessary.

- **Explicit Trigger Types**: Collection tasks declare one of three trigger types: `ANALYST_MANUAL` (analyst requested collection ad hoc), `ANALYST_SCHEDULED` (analyst configured a repeating schedule), or `TAXII_FEED_SUBSCRIPTION` (analyst explicitly subscribed to a structured feed). A database `CHECK` constraint prevents any other value, ensuring no silent autonomous trigger path can be introduced.

- **TAXII 2.1 Feed Subscriptions**: Analysts can subscribe to structured threat-intelligence feeds from ISAC partners, government CERTs, and commercial threat-sharing communities using the OASIS TAXII 2.1 open standard. The `TaxiiFeedConfig` dataclass carries the `created_by` analyst UUID. Polling is automated on the configured interval, but the subscription itself is always an explicit analyst action, not autonomous system behaviour.

- **STIX 2.1 Intelligence Objects**: Feed subscriptions and manual collection ingest STIX 2.1 objects (indicators, threat actors, attack patterns, malware, campaigns) from OASIS-compliant sources. STIX objects are normalised into the Argus entity and intelligence graph with provenance tracing back to the source collection task.

- **Collection Scope Classification**: Tasks are tagged with a collection scope (`OPEN_WEB`, `DARK_WEB`, `TAXII_FEED`, or `SPECIFIC_SOURCE`), enabling downstream audit review to understand where information originated and apply appropriate handling rules.

- **Audit Trail Integration**: Every collection task is recorded in the Argus audit trail with `organization_id`, `initiated_by`, `trigger_type`, `target_query`, and timestamps, satisfying EDF/PESCO compliance requirements for intelligence collection logging.

## Open Standards

| Standard | Organisation | Role in Argus |
|---|---|---|
| TAXII 2.1 | OASIS STIX/TAXII TC | Automated feed subscription and polling protocol |
| STIX 2.1 | OASIS STIX/TAXII TC | Structured threat intelligence object format |
| MISP Sharing Standard | CIRCL / MISP Project | Open sharing protocol for MISP feed compatibility |

All collection is analyst-initiated or analyst-configured. This architecture is distinct from autonomous continuous web collection systems: the `initiated_by` NOT NULL database constraint is the architectural boundary between the two approaches.

## Use Cases

Analyst-Driven OSINT Collection is used across defence intelligence, financial crime investigation, and national cybersecurity sectors.

- **Targeted Investigation**: An analyst investigating a threat actor enters a target query and initiates a manual collection task across configured OSINT providers, with the justification documented in the task record.
- **Structured Feed Monitoring**: An analyst subscribes to a government CERT TAXII 2.1 feed. The platform polls the feed on the configured interval and ingests new STIX indicators automatically, with the analyst's UUID attached to every resulting collection task.
- **Dark Web Monitoring**: An analyst configures a dark web collection scope for a specific organisation or credential type, with justification recorded. Collection executes against configured dark web sources and results feed into the intelligence graph.
- **ISAC Participation**: The platform consumes TAXII 2.1 feeds from sector ISACs, normalising STIX 2.1 indicators into actionable intelligence for the organisation's threat model.

## Integration

- **Intelligence Graph**: STIX objects ingested from collection tasks are converted to graph nodes and edges in the Neo4j intelligence graph, maintaining provenance links to the source `OsintCollectionTask`.
- **Threat Intel Domain**: Collection results flow into the threat intelligence pipeline for correlation and enrichment.
- **Alerting**: New STIX indicators from TAXII feeds can trigger alerts against existing monitored entities.
- **MISP Integration**: MISP sharing events are ingested via the MISP connector, which creates `OsintCollectionTask` records with `trigger_type = TAXII_FEED_SUBSCRIPTION` where the MISP feed was analyst-configured.
