{"id":"analyst-driven-osint","slug":"analyst-driven-osint","title":"Analyst-Driven OSINT Collection","description":"Intelligence collection from open-source and structured threat feeds is most useful when it is purposeful, documented, and tied to an investigative question. Ad hoc scraping of the public web or dark web without a human ","category":"intelligence","tags":["intelligence","compliance"],"lastModified":"2026-04-14","source_ref":"content/modules/analyst-driven-osint.md","url":"/developers/analyst-driven-osint","htmlPath":"/developers/analyst-driven-osint","jsonPath":"/api/docs/modules/analyst-driven-osint","markdownPath":"/api/docs/modules/analyst-driven-osint?format=markdown","checksum":"6acb6082d2c4c9654205c9a892fcfd977293ba59698264882b6bbaf12fb09817","headings":[{"id":"overview","text":"Overview","level":2},{"id":"key-features","text":"Key Features","level":2},{"id":"open-standards","text":"Open Standards","level":2},{"id":"use-cases","text":"Use Cases","level":2},{"id":"integration","text":"Integration","level":2}],"markdown":"# Analyst-Driven OSINT Collection\n\n## Overview\n\nIntelligence collection from open-source and structured threat feeds is most useful when it is purposeful, documented, and tied to an investigative question. Ad hoc scraping of the public web or dark web without a human in the loop produces noise, consumes resources, and creates legal exposure without producing proportionate analytical value. The Argus OSINT architecture starts from a different premise: every collection act begins with an analyst, and the system enforces this invariant at the database layer.\n\nAll Argus OSINT collection is analyst-initiated. An analyst must specify a target query, a collection scope, a trigger type, and a justification before collection begins. For structured threat-intelligence feeds, analysts configure explicit TAXII 2.1 subscriptions using the OASIS open standard; automated polling occurs only for feeds an analyst has deliberately opted into. There is no background crawler, no autonomous discovery loop, and no system-initiated collection path.\n\n```mermaid\ngraph LR\n    A[Analyst Initiates Collection] --> B[OsintCollectionTask\\nanalyst UUID required]\n    B --> C{trigger_type}\n    C -->|ANALYST_MANUAL| D[Ad-hoc Collection Executes]\n    C -->|ANALYST_SCHEDULED| E[Scheduled Collection Executes]\n    C -->|TAXII_FEED_SUBSCRIPTION| F[TAXII 2.1 Poll]\n    D --> G[STIX 2.1 Objects]\n    E --> G\n    F --> G\n    G --> H[Entity Ingestion Pipeline]\n    H --> I[Intelligence Graph]\n    J[Analyst Configures TAXII Feed] --> K[TaxiiFeedConfig\\ncreated_by analyst UUID]\n    K --> L[Scheduled Poll]\n    L --> F\n```\n\n**Last Reviewed:** 2026-04-14\n**Last Updated:** 2026-04-14\n\n## Key Features\n\n- **Analyst-Owned Collection Tasks**: Every OSINT collection task carries a mandatory `initiated_by` UUID identifying the analyst who authorised collection. This is enforced at the PostgreSQL layer as a `NOT NULL` constraint; no collection record can exist without an owning analyst. The `OsintCollectionTask` model additionally captures a free-text `justification` field so analysts can document why a particular collection was necessary.\n\n- **Explicit Trigger Types**: Collection tasks declare one of three trigger types: `ANALYST_MANUAL` (analyst requested collection ad hoc), `ANALYST_SCHEDULED` (analyst configured a repeating schedule), or `TAXII_FEED_SUBSCRIPTION` (analyst explicitly subscribed to a structured feed). A database `CHECK` constraint prevents any other value, ensuring no silent autonomous trigger path can be introduced.\n\n- **TAXII 2.1 Feed Subscriptions**: Analysts can subscribe to structured threat-intelligence feeds from ISAC partners, government CERTs, and commercial threat-sharing communities using the OASIS TAXII 2.1 open standard. The `TaxiiFeedConfig` dataclass carries the `created_by` analyst UUID. Polling is automated on the configured interval, but the subscription itself is always an explicit analyst action, not autonomous system behaviour.\n\n- **STIX 2.1 Intelligence Objects**: Feed subscriptions and manual collection ingest STIX 2.1 objects (indicators, threat actors, attack patterns, malware, campaigns) from OASIS-compliant sources. STIX objects are normalised into the Argus entity and intelligence graph with provenance tracing back to the source collection task.\n\n- **Collection Scope Classification**: Tasks are tagged with a collection scope (`OPEN_WEB`, `DARK_WEB`, `TAXII_FEED`, or `SPECIFIC_SOURCE`), enabling downstream audit review to understand where information originated and apply appropriate handling rules.\n\n- **Audit Trail Integration**: Every collection task is recorded in the Argus audit trail with `organization_id`, `initiated_by`, `trigger_type`, `target_query`, and timestamps, satisfying EDF/PESCO compliance requirements for intelligence collection logging.\n\n## Open Standards\n\n| Standard | Organisation | Role in Argus |\n|---|---|---|\n| TAXII 2.1 | OASIS STIX/TAXII TC | Automated feed subscription and polling protocol |\n| STIX 2.1 | OASIS STIX/TAXII TC | Structured threat intelligence object format |\n| MISP Sharing Standard | CIRCL / MISP Project | Open sharing protocol for MISP feed compatibility |\n\nAll collection is analyst-initiated or analyst-configured. This architecture is distinct from autonomous continuous web collection systems: the `initiated_by` NOT NULL database constraint is the architectural boundary between the two approaches.\n\n## Use Cases\n\nAnalyst-Driven OSINT Collection is used across defence intelligence, financial crime investigation, and national cybersecurity sectors.\n\n- **Targeted Investigation**: An analyst investigating a threat actor enters a target query and initiates a manual collection task across configured OSINT providers, with the justification documented in the task record.\n- **Structured Feed Monitoring**: An analyst subscribes to a government CERT TAXII 2.1 feed. The platform polls the feed on the configured interval and ingests new STIX indicators automatically, with the analyst's UUID attached to every resulting collection task.\n- **Dark Web Monitoring**: An analyst configures a dark web collection scope for a specific organisation or credential type, with justification recorded. Collection executes against configured dark web sources and results feed into the intelligence graph.\n- **ISAC Participation**: The platform consumes TAXII 2.1 feeds from sector ISACs, normalising STIX 2.1 indicators into actionable intelligence for the organisation's threat model.\n\n## Integration\n\n- **Intelligence Graph**: STIX objects ingested from collection tasks are converted to graph nodes and edges in the Neo4j intelligence graph, maintaining provenance links to the source `OsintCollectionTask`.\n- **Threat Intel Domain**: Collection results flow into the threat intelligence pipeline for correlation and enrichment.\n- **Alerting**: New STIX indicators from TAXII feeds can trigger alerts against existing monitored entities.\n- **MISP Integration**: MISP sharing events are ingested via the MISP connector, which creates `OsintCollectionTask` records with `trigger_type = TAXII_FEED_SUBSCRIPTION` where the MISP feed was analyst-configured.\n"}