[Developers]

AI Model Adversarial Robustness Evaluation (IBM ART)

Benchmark any deployed model against known adversarial attack families and prove, with a single robustness score, whether your defences actually hold before an adversary tests them for you.

Category: AiLast Updated: May 26, 2026
ai

Overview#

Benchmark any deployed model against known adversarial attack families and prove, with a single robustness score, whether your defences actually hold before an adversary tests them for you.

Argus connects to an IBM Adversarial Robustness Toolbox (ART) evaluation service so that security and defence organisations can measure how their machine learning models behave under attack. An operator names a model, selects an adversarial attack family such as FGSM or PGD, and chooses a defence such as adversarial training. The platform runs the evaluation against the ART service, records the accuracy before and after the attack, computes a robustness score, and stores the full result in a tenant-scoped record. Every result is filterable, classification-gated, and rolled up into organisation-level statistics, giving teams a quantitative, repeatable assurance metric rather than a one-off lab experiment.

Because every evaluation is persisted per tenant, analysts build a historical record they can trend over time. When a model is retrained or a defence is updated, the before and after accuracy delta and robustness score make regression immediately visible, turning model hardening into a measurable, auditable engineering discipline directly relevant to NATO AI assurance expectations and EU AI Act high-risk system conformance.

Key Features#

  • Attack Family Coverage: Evaluate models against established adversarial attack types including FGSM and PGD, alongside the wider set of attacks supported by the ART framework, so a single capability spans the threat families most relevant to deployed vision and classification models.
  • Defence Benchmarking: Apply a named defence such as adversarial training and capture its effect directly. The before and after accuracy figures quantify exactly how much protection a given defence delivers against a chosen attack.
  • Robustness Scoring: Every evaluation yields a single robustness score plus an accuracy delta, converting a complex adversarial assessment into an assurance metric that non-specialists, auditors, and programme leads can read at a glance.
  • Tenant-Scoped Evaluation History: All results are stored against the requesting organisation only. Analysts retain a complete, filterable record of every evaluation run, enabling trend analysis as models evolve.
  • Status Filtering: Retrieve evaluations filtered by status so teams can separate completed assessments from those still in progress and focus reporting on finalised results.
  • Organisation-Level Statistics: Aggregate views report total evaluations, completed evaluations, average robustness score, and average accuracy delta across the organisation, supporting fleet-wide model assurance reporting.
  • Clearance-Aware Access: Evaluation records carry classification labels and are gated by the requesting user's clearance level, so sensitive model assessments remain visible only to appropriately cleared analysts.
  • Full Audit Trail: Every submission is written to the platform audit trail and emits an operational entity event, producing an immutable, timestamped record of who evaluated which model against which attack.

Use Cases#

Defence and National Security AI Programmes#

Organisations fielding AI models in targeting, detection, or decision-support roles run continuous adversarial evaluation to satisfy NATO AI assurance and robustness expectations, producing the quantitative evidence that an assurance case requires before a model is approved for operational use.

EU AI Act High-Risk System Operators#

Operators of high-risk AI systems use the before and after accuracy delta and robustness score as conformance evidence, demonstrating that models have been tested against adversarial manipulation and that defences have a measured, documented effect.

Model Assurance and MLOps Teams#

Machine learning engineering teams fold adversarial evaluation into their release process. When a model is retrained or a defence is updated, the historical record surfaces any regression in robustness immediately, preventing a hardened model from silently degrading across versions.

Security Operations and Red Teams#

Red teams and security analysts benchmark candidate models against a consistent set of attack families, comparing robustness scores across model variants to inform which model is fielded and which requires further hardening.

Integration#

The operator-facing capability is exposed over a typed GraphQL API secured with OAuth 2.0 and JWT bearer tokens, with every operation scoped to the requesting organisation's tenant. Three operations are provided: a submission mutation named submitArtEvaluation, a listing field named artEvaluations that accepts an optional status filter, and an aggregate statistics field named artStats. A customer plugs in the address of their own ART evaluation service, optionally with an API token, and receives normalised result records without having to build their own evaluation harness or storage layer.

Behind the operator surface, Argus communicates with the ART evaluation service over standard REST messaging via HTTPS, then normalises the response into a consistent result model covering model name, attack type, defence type, accuracy before, accuracy after, robustness score, and status. Results are persisted in tenant-scoped PostgreSQL, so the same record is available to historical reporting, statistics, and downstream entity correlation. The audit and operational entity events are written automatically, meaning a customer's existing audit and reporting pipelines see adversarial evaluations alongside every other platform activity with no additional wiring.

Open Standards#

  • IBM Adversarial Robustness Toolbox (ART), the open-source machine learning security framework that defines the attack and defence families this capability evaluates; it is the de facto open standard for adversarial machine learning assessment, with no formal RFC or ISO number.
  • OAuth 2.0 and JWT (RFC 7519), all platform operations, including evaluation submission and retrieval, require bearer tokens issued through the standard OAuth 2.0 authorisation flow.
  • GraphQL (June 2018 specification), the operator-facing capability is exposed over a typed GraphQL API, enabling precise field selection across evaluation history and aggregate statistics.
  • HTTP and JSON over TLS, communication with the external ART evaluation service uses standard HTTPS requests carrying JSON payloads, so any conformant ART service endpoint can be connected.
  • NATO STANAG classification markings, evaluation records carry classification labels aligned with NATO STANAG marking conventions, enabling clearance-gated access in multi-level security environments.

Security & Compliance#

Every evaluation record is organisation-scoped and access-controlled at the API layer, so a user can only retrieve evaluations belonging to their own tenant. Classification labels on individual records enforce need-to-know access, and clearance-level gating ensures sensitive model assessments are filtered to appropriately cleared analysts. Each submission is written to the platform audit trail, producing an immutable, timestamped record that supports the evidence requirements of NATO AI assurance and EU AI Act high-risk system conformance. No external evaluation service credentials or raw results are exposed outside the tenant boundary.

Last Reviewed: 2026-05-26 / Last Updated: 2026-05-26

Ready to Build?

Get started with our APIs or contact our integration team for support.