API Reference

This document provides comprehensive reference for NASBench-101/201/301 APIs, including architecture representations, method signatures, return types, and benchmark-specific details.

Benchmarks Overview

Benchmark

Datasets

Available Splits

Primary Metrics

Training Epochs

NASBench-101

CIFAR-10

train, val, test

train/val/test accuracy

4, 12, 36, 108

NASBench-201

CIFAR-10, CIFAR-100, ImageNet16-120

train, val, test

train/val/test accuracy, losses

0-199 (200 epochs total)

NASBench-301

CIFAR-10, CIFAR-100

val, test

surrogate val/test accuracy

N/A (surrogate-based)

Architecture Representations

Each benchmark uses a different architecture representation:

NASBench-101 (Arch101):

  • Dataclass with two fields:

    • adjacency: list[list[int]] — 7×7 adjacency matrix

    • operations: list[str] — 7 operations from [‘input’, ‘conv3x3-bn-relu’, ‘conv1x1-bn-relu’, ‘maxpool3x3’, ‘output’]

  • Example:

Arch101(
    adjacency=[[0, 1, 1, 0, 0, 0, 0],
               [0, 0, 0, 1, 1, 0, 0],
               ...],
    operations=['input', 'conv3x3-bn-relu', 'conv1x1-bn-relu', ..., 'output']
)

NASBench-201 (String):

  • Architecture string format: |op~0|+|op~0|op~1|+|op~0|op~1|op~2|

  • 6 edges connecting 4 nodes in a cell

  • 5 operations per edge: [‘none’, ‘skip_connect’, ‘nor_conv_1x1’, ‘nor_conv_3x3’, ‘avg_pool_3x3’]

  • Total search space: 5^6 = 15,625 unique architectures

  • Each architecture maps to a canonical index (0-15624)

  • Example:

'|none~0|+|skip_connect~0|nor_conv_1x1~1|+|nor_conv_3x3~0|avg_pool_3x3~1|skip_connect~2|'

NASBench-301 (Dict):

  • DARTS-style architecture with normal and reduction cells

  • Dictionary with ‘normal’ and ‘reduce’ keys

  • Each cell: list of (operation, predecessor_node) tuples

  • 8 operations: [‘max_pool_3x3’, ‘avg_pool_3x3’, ‘skip_connect’, ‘sep_conv_3x3’, ‘sep_conv_5x5’, ‘dil_conv_3x3’, ‘dil_conv_5x5’, ‘none’]

  • 4 intermediate nodes per cell, each with 2 input edges

  • Example:

{
    'normal': [('sep_conv_3x3', 0), ('sep_conv_3x3', 1), ('sep_conv_3x3', 0), ...],
    'reduce': [('max_pool_3x3', 0), ('max_pool_3x3', 1), ('skip_connect', 2), ...]
}

Common API Surface

All benchmarks expose the following core methods.

Initialization

from nasbenchapi import NASBench101, NASBench201, NASBench301

# Using explicit path
api = NASBench201('/path/to/nb201.pkl', verbose=True)

# Using environment variable
api = NASBench201(verbose=True)  # Reads from NASBENCH201_PATH

Constructor Args:

  • data_path: Optional[str] — path to pickled benchmark data; if None, reads from environment variable

  • verbose: bool — enable/disable all logging output (default: True)

Environment Variables:

  • NASBENCH101_PATH — path to NB101 pickle file

  • NASBENCH201_PATH — path to NB201 pickle file

  • NASBENCH301_PATH — path to NB301 pickle file

get_statistics

Get statistics about the loaded benchmark data.

stats = api.get_statistics()

Returns: dict — benchmark statistics

Return Format by Benchmark:

  • NB101: {'benchmark': 'nasbench101', 'architectures': int, 'records': int}

  • NB201: {'benchmark': 'nasbench201', 'entries': int}

  • NB301: {'benchmark': 'nasbench301', 'files': int}

random_sample

Sample random architectures from the benchmark search space.

samples = api.random_sample(n=5, seed=123)

Args:

  • n: int — number of samples (default: 1)

  • seed: Optional[int] — RNG seed for reproducibility

Returns:

  • NB101: list[Arch101] — list of Arch101 dataclass objects

  • NB201: list[str] — list of architecture strings

  • NB301: list[int] — indices for entries in the loaded dataset (falls back to synthetic architecture dicts if raw entries are unavailable)

iter_all

Iterate over all available architectures in the loaded data.

for arch in api.iter_all():
    result = api.query(arch, dataset='cifar10', split='val')

Returns:

  • NB101: Iterator[Arch101]

  • NB201: Iterator[str] — architecture strings

  • NB301: Iterator[int] — indices in loaded data

get_index

Get an identifier or index for an architecture.

# NB201: Convert arch string to numeric index
idx = api.get_index('|none~0|+|skip_connect~0|nor_conv_1x1~1|+|...')
# Returns: 12345 (int in range 0-15624)

# NB101: Get hash identifier
hash_id = api.get_index(arch_obj)
# Returns: 'a3f5b2...' (SHA256 hash string)

# NB301: Find index in loaded data
idx = api.get_index(arch_dict)
# Returns: 42 or None

Args:

  • arch: Architecture representation (type depends on benchmark)

    • NB101: Arch101 object

    • NB201: str (architecture string)

    • NB301: dict (architecture dict)

Returns:

  • NB101: str — stable SHA256 hash identifier

  • NB201: int — canonical index (0-15624)

  • NB301: Optional[int] — index in loaded data, or None if not found

available_budgets

List available training budgets (epochs) for a dataset/split combination.

budgets = api.available_budgets(dataset='cifar10', split='val')
# Returns e.g. [199, 200] for NB201 validation

Args:

  • dataset: Optional[str] — target dataset (defaults to all datasets)

  • split: Optional[str] — target split (defaults to all splits)

Returns: Optional[list] — sorted list of budgets if tracked; None when budgets are not defined for the benchmark.

  • NB101: returns None (budgets not tracked)

  • NB201: list of available epochs per dataset/split based on original training logs

  • NB301: epochs derived from per-entry learning curves (validation) or final declared budget (test)

exists

Validate whether a combination of dataset, split, budget, and architecture is supported without issuing a full query.

api.exists(dataset='cifar10', split='val', budget=199)  # -> True

Args:

  • dataset: Optional[str]

  • split: Optional[str]

  • budget: Optional[Any]

  • arch: Optional[Any] — architecture representation

Returns: bool — True if every provided component is supported, False otherwise.

query

Query performance metrics for an architecture from loaded data.

# NB201 example
result = api.query(
    arch='|none~0|+|skip_connect~0|nor_conv_1x1~1|+|...',
    dataset='cifar10',
    split='val',
    seed=777,
    budget=199
)
print(f"Validation accuracy: {result['metric']:.2f}%")
print(f"Training time: {result['cost']:.2f}s")

Args:

  • arch: Architecture representation (depends on benchmark)

    • NB101: Arch101 object

    • NB201: str (architecture string)

    • NB301: Any (dict or index)

  • dataset: str — dataset name

    • NB101: ‘cifar10’

    • NB201: ‘cifar10’, ‘cifar100’, ‘ImageNet16-120’

    • NB301: ‘cifar10’, ‘cifar100’

  • split: str — data split

    • NB101: ‘train’, ‘val’, ‘test’

    • NB201: ‘train’, ‘val’, ‘test’

    • NB301: ‘val’, ‘test’

  • seed: Optional[int] — random seed (default varies by benchmark)

    • NB201: default 777 (official NB201 seed)

    • NB101/NB301: unused

  • budget: Optional[Any] — training budget

    • NB101: unused (returns final recorded metrics)

    • NB201: epoch number 0-199 (default: 199 for final epoch)

    • NB301: epoch index for validation curves (defaults to final epoch); test split always reports the declared final budget

Returns: dict with the following keys:

{
    'metric': Optional[float],      # Primary metric (e.g., accuracy %)
    'metric_name': str,              # Name of metric (e.g., 'val_acc')
    'cost': Optional[float],         # Training time in seconds
    'std': Optional[float],          # Standard deviation (if available)
    'info': dict                     # Additional metadata and raw data
}

Return Value Details:

  • NB101: Returns a tuple (info_dict, metrics_by_budget) by default.

    • info_dict contains module_adjacency, module_operations, module_hash, and aggregate training metadata.

    • metrics_by_budget is a dict keyed by epoch budgets (4/12/36/108), where each value is a list of up to three run dictionaries. Each run dictionary mirrors the native NASBench metrics: halfway_* and final_* keys as well as training times.

    • average=True collapses each budget to a single averaged metrics dictionary.

    • summary=True restores the condensed dict (metric, metric_name, cost, std, info) for backwards compatibility.

  • NB201 / NB301: Return a dictionary with keys:

    • metric: Accuracy percentage (e.g., 94.5) or None if not available

    • metric_name: Describes the metric, typically {split}_acc

    • cost: Training/evaluation time in seconds, or None

    • std: Standard deviation of the metric across multiple runs (rarely used)

    • info: Dictionary containing additional information

      • NB201: arch_index, dataset, split, seed, epoch, arch_str, params, flop

      • NB301: Entry metadata (index, dataset, epochs available/used, declared budget, optimizer tag, JSON path)

NASBench-101 Specifics

Import and Initialization

from nasbenchapi import NASBench101

api = NASBench101('/path/to/nasbench_only108.pkl', verbose=True)
# Or use environment variable
api = NASBench101(verbose=True)

Dataset and Splits

  • Single dataset: CIFAR-10 only

  • Splits: train, val, test

  • Training epochs: 4, 12, 36, 108 (typically query final epoch 108)

Architecture Type (Arch101)

from nasbenchapi import Arch101

arch = Arch101(
    adjacency=[[0, 1, 1, 0, 0, 0, 0], ...],  # 7×7 matrix
    operations=['input', 'conv3x3-bn-relu', ..., 'output']  # 7 ops
)

Operations

Available operations (from op_set()):

  • ‘input’ (fixed at node 0)

  • ‘conv3x3-bn-relu’

  • ‘conv1x1-bn-relu’

  • ‘maxpool3x3’

  • ‘output’ (fixed at node 6)

encode / decode / id

# Encode Arch101 to native strings
encoding = api.encode(arch)
# Returns: {'adjacency_str': '0110000...', 'operations_str': 'input,conv3x3-bn-relu,...'}

# Decode encoding to Arch101
arch = api.decode(encoding)

# Get stable hash ID
arch_id = api.id(arch)
# Returns: 'a3f5b2c8...' (SHA256 hash)

get_index

# Returns the same as id() for consistency
hash_id = api.get_index(arch)
# Returns: 'a3f5b2c8...'

random_sample

archs = api.random_sample(n=10, seed=42)
# Returns: list of 10 Arch101 objects sampled from loaded data

iter_all

for arch in api.iter_all():
    result = api.query(arch, dataset='cifar10', split='test')
    print(f"Test acc: {result['metric']:.2f}%")

query

info, metrics = api.query(arch, dataset='cifar10', split='val')
# metrics -> {4: [run_dict, ...], 12: [...], 36: [...], 108: [...]}
averaged = api.query(arch, dataset='cifar10', split='val', average=True)[1]
summary = api.query(arch, dataset='cifar10', split='val', summary=True)

Args:

  • arch: Arch101 — architecture object

  • dataset: str — ‘cifar10’ (only dataset available)

  • split: str — ‘train’, ‘val’, or ‘test’

  • seed: Optional[int] — unused

  • budget: Optional[Any] — unused (all budgets available in metrics)

  • average: Optional[bool] — return averaged metrics per budget when True

  • summary: Optional[bool] — return condensed dict (legacy shape) when True

Returns:

  • Tuple (info_dict, metrics_by_budget) when summary=False (default)

  • Condensed dict when summary=True

train_time

Get training time for an architecture.

time_sec = api.train_time(arch, dataset='cifar10')
# Returns: float (seconds) or None

mutate

Apply a mutation to an architecture.

import random
rng = random.Random(42)
mutated = api.mutate(arch, rng=rng, kind='edge_toggle')

Mutation kinds:

  • ‘edge_toggle’ — flip an edge in the adjacency matrix

  • ‘op_swap’ — swap two operations

NASBench-201 Specifics

Import and Initialization

from nasbenchapi import NASBench201

api = NASBench201('/path/to/NASBench-201-v1_1-096897.pth', verbose=True)
# Or use environment variable
api = NASBench201(verbose=True)

Dataset and Splits

  • Datasets: CIFAR-10, CIFAR-100, ImageNet16-120

  • Splits: train, val, test

  • Training epochs: 0-199 (200 epochs total)

  • Common budget values: 12 (early), 199 (final epoch)

  • Default seed: 777 (official NB201 seed)

Architecture Representation

NB201 uses architecture strings as the primary representation:

arch_str = '|none~0|+|skip_connect~0|nor_conv_1x1~1|+|nor_conv_3x3~0|avg_pool_3x3~1|skip_connect~2|'

Format details:

  • Cell with 4 nodes (node 0 is input, nodes 1-3 are intermediate, node 4 is output)

  • 6 edges: (1←0), (2←0), (2←1), (3←0), (3←1), (3←2)

  • Each edge has one operation from: [‘none’, ‘skip_connect’, ‘nor_conv_1x1’, ‘nor_conv_3x3’, ‘avg_pool_3x3’]

  • String format: |op~src|+|op~src|op~src|+|op~src|op~src|op~src|

Index mapping:

  • Each architecture has a canonical integer index: 0 to 15,624

  • Use get_index(arch_str) to convert string → index

random_sample

arch_strs = api.random_sample(n=5, seed=42)
# Returns: ['|none~0|+|...', '|skip_connect~0|+|...', ...]

Returns: list[str] — architecture strings

iter_all

for arch_str in api.iter_all():
    idx = api.get_index(arch_str)
    print(f"Architecture {idx}: {arch_str}")

Returns: Iterator[str] — yields architecture strings

get_index

Convert an architecture string to its canonical integer index.

idx = api.get_index('|none~0|+|skip_connect~0|nor_conv_1x1~1|+|...')
# Returns: 12345 (int in range 0-15624)

Args:

  • arch: str — NB201 architecture string

Returns: int — index (0-15624)

Raises: ValueError if architecture string is invalid

query

result = api.query(
    arch='|none~0|+|skip_connect~0|nor_conv_1x1~1|+|...',
    dataset='cifar10',
    split='val',
    seed=777,      # Default seed
    budget=199     # Final epoch
)

Args:

  • arch: str — NB201 architecture string

  • dataset: str — ‘cifar10’, ‘cifar100’, or ‘ImageNet16-120’

  • split: str — ‘train’, ‘val’, or ‘test’

  • seed: Optional[int] — data seed (default: 777)

  • budget: Optional[int] — epoch number 0-199 (default: 199)

Returns: dict with keys:

  • metric: accuracy percentage (e.g., 91.23)

  • metric_name: ‘{split}_acc’

  • cost: training/eval time in seconds

  • std: None (not used)

  • info: dict with arch_index, dataset, split, seed, epoch, arch_str, params, flop

Split-specific behavior:

  • ‘train’: Returns training accuracy at specified epoch

  • ‘val’: Returns validation accuracy (uses ‘x-valid@epoch’ keys in data)

  • ‘test’: Returns test accuracy (uses ‘ori-test@epoch’ keys, falls back to validation)

NASBench-301 Specifics

Import and Initialization

from nasbenchapi import NASBench301

api = NASBench301('/path/to/nb301_data.pkl', verbose=True)
# Or use environment variable
api = NASBench301(verbose=True)

Dataset and Splits

  • Datasets: CIFAR-10, CIFAR-100

  • Splits: val, test (no train split for surrogates)

  • Training epochs: Validation learning curves provide per-epoch accuracies; the test split reports metrics at the declared final budget for each entry.

Architecture Representation

NB301 uses DARTS-style architecture dictionaries:

arch = {
    'normal': [
        ('sep_conv_3x3', 0), ('sep_conv_3x3', 1),  # Node 1 inputs
        ('sep_conv_3x3', 0), ('sep_conv_3x3', 1),  # Node 2 inputs
        ('sep_conv_3x3', 1), ('skip_connect', 0),  # Node 3 inputs
        ('skip_connect', 0), ('dil_conv_3x3', 2)   # Node 4 inputs
    ],
    'reduce': [
        ('max_pool_3x3', 0), ('max_pool_3x3', 1),
        ('skip_connect', 2), ('max_pool_3x3', 0),
        ('max_pool_3x3', 0), ('skip_connect', 2),
        ('skip_connect', 2), ('max_pool_3x3', 1)
    ]
}

Format details:

  • Two cells: ‘normal’ and ‘reduce’ (reduction cell)

  • Each cell has 4 intermediate nodes

  • Each node selects 2 operations from previous nodes (including input nodes 0 and 1)

  • 8 operations: [‘max_pool_3x3’, ‘avg_pool_3x3’, ‘skip_connect’, ‘sep_conv_3x3’, ‘sep_conv_5x5’, ‘dil_conv_3x3’, ‘dil_conv_5x5’, ‘none’]

  • Each entry is a tuple: (operation_name, predecessor_node_index)

random_sample

indices = api.random_sample(n=3, seed=42)
# Returns: [102, 4096, 7123]

Returns: list[int] — dataset entry indices (falls back to architecture dict samples if raw entries are unavailable)

iter_all

for idx in api.iter_all():
    print(f"Architecture index: {idx}")

Returns: Iterator[int] — yields indices in loaded data

get_index

Find the index of an architecture in loaded data.

idx = api.get_index(arch_dict)
# Returns: 42 (int) or None if not found

Args:

  • arch: Any — architecture dict, dataset index, or entry path string

Returns: Optional[int] — index in loaded data, or None if not found

query

result = api.query(
    arch=0,           # dataset index
    dataset='cifar10',
    split='val',
    budget=50,        # epoch index
)

Args:

  • arch: Any — dataset index (int), entry path (str), or architecture dict with ‘normal’/’reduce’ keys

  • dataset: str — ‘cifar10’ or ‘cifar100’

  • split: str — ‘val’ or ‘test’

  • seed: Optional[int] — unused

  • budget: Optional[int] — epoch index for validation curves (defaults to final epoch); ignored for test split

Returns: dict with keys: metric, metric_name, cost, std, info (runtime in seconds, dataset metadata, epochs available/used, declared budget, optimizer tag, and JSON path)

Split behavior:

  • val: accuracy from the per-entry validation learning curve; budgets beyond the recorded length fall back to the final epoch.

  • test: reported test accuracy at the declared final budget (the budget argument is ignored).

Complete Usage Examples

NASBench-101 Example

from nasbenchapi import NASBench101

# Initialize
api = NASBench101(verbose=True)
stats = api.get_statistics()
print(f"Loaded {stats['architectures']} architectures")

# Sample architectures
archs = api.random_sample(n=5, seed=42)

# Query performance
for arch in archs:
    result = api.query(arch, dataset='cifar10', split='test')
    print(f"Test accuracy: {result['metric']:.2f}%")
    print(f"Training time: {result['cost']:.2f}s")

NASBench-201 Example

from nasbenchapi import NASBench201

# Initialize
api = NASBench201(verbose=True)

# Sample architecture strings
arch_strs = api.random_sample(n=3, seed=777)

# Query on multiple datasets
for arch_str in arch_strs:
    idx = api.get_index(arch_str)
    print(f"\nArchitecture {idx}:")

    for dataset in ['cifar10', 'cifar100', 'ImageNet16-120']:
        result = api.query(
            arch=arch_str,
            dataset=dataset,
            split='test',
            seed=777,
            budget=199
        )
        print(f"  {dataset} test acc: {result['metric']:.2f}%")

# Iterate all architectures
count = 0
for arch_str in api.iter_all():
    count += 1
    if count > 5:
        break
    result = api.query(arch_str, dataset='cifar10', split='val')
    print(f"Arch {count}: val_acc = {result['metric']:.2f}%")

NASBench-301 Example

from nasbenchapi import NASBench301

# Initialize
api = NASBench301(verbose=True)

# Sample dataset indices
arch_indices = api.random_sample(n=2, seed=42)

# Query performance at multiple epochs
for idx in arch_indices:
    final_val = api.query(idx, dataset='cifar10', split='val')
    mid_val = api.query(idx, dataset='cifar10', split='val', budget=50)
    print(f"Index {idx}: final={final_val['metric']:.2f}% | mid@50={mid_val['metric']:.2f}%")

Error Handling

Common Exceptions

ValueError:

  • Invalid architecture string format (NB201)

  • Architecture index out of range

  • Invalid dataset or split name

FileNotFoundError:

  • Pickle file not found at specified path

  • Environment variable not set

KeyError:

  • Data format mismatch (e.g., missing expected keys in pickle)

Example Error Handling

from nasbenchapi import NASBench201

try:
    api = NASBench201('/path/to/data.pkl', verbose=True)
except FileNotFoundError:
    print("Data file not found. Please set NASBENCH201_PATH or provide valid path.")
    exit(1)

try:
    result = api.query(
        arch='|invalid~format|',
        dataset='cifar10',
        split='val'
    )
except ValueError as e:
    print(f"Invalid architecture: {e}")

Verbose Logging Control

All benchmarks support a verbose parameter to control logging output:

# Enable all logging (default)
api = NASBench201(verbose=True)
# Outputs:
# Loading NB201 from /path/to/file.pkl (2.1 GB)
# Reading: 100%|██████████| 2.1G/2.1G [00:15<00:00]
# Unpickling data...
# Unpickling complete.
# [NB201] Loaded 15625 architectures (source=arch2infos)

# Disable all logging (silent mode)
api = NASBench201(verbose=False)
# No output

Logging includes:

  • File loading progress bars (via tqdm)

  • Unpickling status messages

  • Data summary and statistics

  • Warning messages (e.g., mapping failures)