API Reference¶

This document provides comprehensive reference for NASBench-101/201/301 APIs, including architecture representations, method signatures, return types, and benchmark-specific details.

Benchmarks Overview¶

Benchmark	Datasets	Available Splits	Primary Metrics	Training Epochs
NASBench-101	CIFAR-10	train, val, test	train/val/test accuracy	4, 12, 36, 108
NASBench-201	CIFAR-10, CIFAR-100, ImageNet16-120	train, val, test	train/val/test accuracy, losses	0-199 (200 epochs total)
NASBench-301	CIFAR-10, CIFAR-100	val, test	surrogate val/test accuracy	N/A (surrogate-based)

Architecture Representations¶

Each benchmark uses a different architecture representation:

NASBench-101 (Arch101):

Dataclass with two fields:
- adjacency: list[list[int]] — 7×7 adjacency matrix
- operations: list[str] — 7 operations from [‘input’, ‘conv3x3-bn-relu’, ‘conv1x1-bn-relu’, ‘maxpool3x3’, ‘output’]
Example:

Arch101(
    adjacency=[[0, 1, 1, 0, 0, 0, 0],
               [0, 0, 0, 1, 1, 0, 0],
               ...],
    operations=['input', 'conv3x3-bn-relu', 'conv1x1-bn-relu', ..., 'output']
)

NASBench-201 (String):

Architecture string format: |op~0|+|op~0|op~1|+|op~0|op~1|op~2|
6 edges connecting 4 nodes in a cell
5 operations per edge: [‘none’, ‘skip_connect’, ‘nor_conv_1x1’, ‘nor_conv_3x3’, ‘avg_pool_3x3’]
Total search space: 5^6 = 15,625 unique architectures
Each architecture maps to a canonical index (0-15624)
Example:

'|none~0|+|skip_connect~0|nor_conv_1x1~1|+|nor_conv_3x3~0|avg_pool_3x3~1|skip_connect~2|'

NASBench-301 (Dict):

DARTS-style architecture with normal and reduction cells
Dictionary with ‘normal’ and ‘reduce’ keys
Each cell: list of (operation, predecessor_node) tuples
8 operations: [‘max_pool_3x3’, ‘avg_pool_3x3’, ‘skip_connect’, ‘sep_conv_3x3’, ‘sep_conv_5x5’, ‘dil_conv_3x3’, ‘dil_conv_5x5’, ‘none’]
4 intermediate nodes per cell, each with 2 input edges
Example:

{
    'normal': [('sep_conv_3x3', 0), ('sep_conv_3x3', 1), ('sep_conv_3x3', 0), ...],
    'reduce': [('max_pool_3x3', 0), ('max_pool_3x3', 1), ('skip_connect', 2), ...]
}

Common API Surface¶

All benchmarks expose the following core methods.

Initialization¶

from nasbenchapi import NASBench101, NASBench201, NASBench301

# Using explicit path
api = NASBench201('/path/to/nb201.pkl', verbose=True)

# Using environment variable
api = NASBench201(verbose=True)  # Reads from NASBENCH201_PATH

Constructor Args:

data_path: Optional[str] — path to pickled benchmark data; if None, reads from environment variable
verbose: bool — enable/disable all logging output (default: True)

Environment Variables:

NASBENCH101_PATH — path to NB101 pickle file
NASBENCH201_PATH — path to NB201 pickle file
NASBENCH301_PATH — path to NB301 pickle file

get_statistics¶

Get statistics about the loaded benchmark data.

stats = api.get_statistics()

Returns: dict — benchmark statistics

Return Format by Benchmark:

NB101: {'benchmark': 'nasbench101', 'architectures': int, 'records': int}
NB201: {'benchmark': 'nasbench201', 'entries': int}
NB301: {'benchmark': 'nasbench301', 'files': int}

random_sample¶

Sample random architectures from the benchmark search space.

samples = api.random_sample(n=5, seed=123)

Args:

n: int — number of samples (default: 1)
seed: Optional[int] — RNG seed for reproducibility

Returns:

NB101: list[Arch101] — list of Arch101 dataclass objects
NB201: list[str] — list of architecture strings
NB301: list[int] — indices for entries in the loaded dataset (falls back to synthetic architecture dicts if raw entries are unavailable)

iter_all¶

Iterate over all available architectures in the loaded data.

for arch in api.iter_all():
    result = api.query(arch, dataset='cifar10', split='val')

Returns:

NB101: Iterator[Arch101]
NB201: Iterator[str] — architecture strings
NB301: Iterator[int] — indices in loaded data

get_index¶

Get an identifier or index for an architecture.

# NB201: Convert arch string to numeric index
idx = api.get_index('|none~0|+|skip_connect~0|nor_conv_1x1~1|+|...')
# Returns: 12345 (int in range 0-15624)

# NB101: Get hash identifier
hash_id = api.get_index(arch_obj)
# Returns: 'a3f5b2...' (SHA256 hash string)

# NB301: Find index in loaded data
idx = api.get_index(arch_dict)
# Returns: 42 or None

Args:

arch: Architecture representation (type depends on benchmark)
- NB101: Arch101 object
- NB201: str (architecture string)
- NB301: dict (architecture dict)

Returns:

NB101: str — stable SHA256 hash identifier
NB201: int — canonical index (0-15624)
NB301: Optional[int] — index in loaded data, or None if not found

available_budgets¶

List available training budgets (epochs) for a dataset/split combination.

budgets = api.available_budgets(dataset='cifar10', split='val')
# Returns e.g. [199, 200] for NB201 validation

Args:

dataset: Optional[str] — target dataset (defaults to all datasets)
split: Optional[str] — target split (defaults to all splits)

Returns: Optional[list] — sorted list of budgets if tracked; None when budgets are not defined for the benchmark.

NB101: returns None (budgets not tracked)
NB201: list of available epochs per dataset/split based on original training logs
NB301: epochs derived from per-entry learning curves (validation) or final declared budget (test)

exists¶

Validate whether a combination of dataset, split, budget, and architecture is supported without issuing a full query.

api.exists(dataset='cifar10', split='val', budget=199)  # -> True

Args:

dataset: Optional[str]
split: Optional[str]
budget: Optional[Any]
arch: Optional[Any] — architecture representation

Returns: bool — True if every provided component is supported, False otherwise.

query¶

Query performance metrics for an architecture from loaded data.

# NB201 example
result = api.query(
    arch='|none~0|+|skip_connect~0|nor_conv_1x1~1|+|...',
    dataset='cifar10',
    split='val',
    seed=777,
    budget=199
)
print(f"Validation accuracy: {result['metric']:.2f}%")
print(f"Training time: {result['cost']:.2f}s")

Args:

arch: Architecture representation (depends on benchmark)
- NB101: Arch101 object
- NB201: str (architecture string)
- NB301: Any (dict or index)
dataset: str — dataset name
- NB101: ‘cifar10’
- NB201: ‘cifar10’, ‘cifar100’, ‘ImageNet16-120’
- NB301: ‘cifar10’, ‘cifar100’
split: str — data split
- NB101: ‘train’, ‘val’, ‘test’
- NB201: ‘train’, ‘val’, ‘test’
- NB301: ‘val’, ‘test’
seed: Optional[int] — random seed (default varies by benchmark)
- NB201: default 777 (official NB201 seed)
- NB101/NB301: unused
budget: Optional[Any] — training budget
- NB101: unused (returns final recorded metrics)
- NB201: epoch number 0-199 (default: 199 for final epoch)
- NB301: epoch index for validation curves (defaults to final epoch); test split always reports the declared final budget

Returns: dict with the following keys:

{
    'metric': Optional[float],      # Primary metric (e.g., accuracy %)
    'metric_name': str,              # Name of metric (e.g., 'val_acc')
    'cost': Optional[float],         # Training time in seconds
    'std': Optional[float],          # Standard deviation (if available)
    'info': dict                     # Additional metadata and raw data
}

Return Value Details:

NB101: Returns a tuple (info_dict, metrics_by_budget) by default.
- info_dict contains module_adjacency, module_operations, module_hash, and aggregate training metadata.
- metrics_by_budget is a dict keyed by epoch budgets (4/12/36/108), where each value is a list of up to three run dictionaries. Each run dictionary mirrors the native NASBench metrics: halfway_* and final_* keys as well as training times.
- average=True collapses each budget to a single averaged metrics dictionary.
- summary=True restores the condensed dict (metric, metric_name, cost, std, info) for backwards compatibility.
NB201 / NB301: Return a dictionary with keys:
- metric: Accuracy percentage (e.g., 94.5) or None if not available
- metric_name: Describes the metric, typically {split}_acc
- cost: Training/evaluation time in seconds, or None
- std: Standard deviation of the metric across multiple runs (rarely used)
- info: Dictionary containing additional information
  - NB201: arch_index, dataset, split, seed, epoch, arch_str, params, flop
  - NB301: Entry metadata (index, dataset, epochs available/used, declared budget, optimizer tag, JSON path)

NASBench-101 Specifics¶

Import and Initialization¶

from nasbenchapi import NASBench101

api = NASBench101('/path/to/nasbench_only108.pkl', verbose=True)
# Or use environment variable
api = NASBench101(verbose=True)

Dataset and Splits¶

Single dataset: CIFAR-10 only
Splits: train, val, test
Training epochs: 4, 12, 36, 108 (typically query final epoch 108)

Architecture Type (Arch101)¶

from nasbenchapi import Arch101

arch = Arch101(
    adjacency=[[0, 1, 1, 0, 0, 0, 0], ...],  # 7×7 matrix
    operations=['input', 'conv3x3-bn-relu', ..., 'output']  # 7 ops
)

Operations¶

Available operations (from op_set()):

‘input’ (fixed at node 0)
‘conv3x3-bn-relu’
‘conv1x1-bn-relu’
‘maxpool3x3’
‘output’ (fixed at node 6)

encode / decode / id¶

# Encode Arch101 to native strings
encoding = api.encode(arch)
# Returns: {'adjacency_str': '0110000...', 'operations_str': 'input,conv3x3-bn-relu,...'}

# Decode encoding to Arch101
arch = api.decode(encoding)

# Get stable hash ID
arch_id = api.id(arch)
# Returns: 'a3f5b2c8...' (SHA256 hash)

get_index¶

# Returns the same as id() for consistency
hash_id = api.get_index(arch)
# Returns: 'a3f5b2c8...'

random_sample¶

archs = api.random_sample(n=10, seed=42)
# Returns: list of 10 Arch101 objects sampled from loaded data

iter_all¶

for arch in api.iter_all():
    result = api.query(arch, dataset='cifar10', split='test')
    print(f"Test acc: {result['metric']:.2f}%")

query¶

info, metrics = api.query(arch, dataset='cifar10', split='val')
# metrics -> {4: [run_dict, ...], 12: [...], 36: [...], 108: [...]}
averaged = api.query(arch, dataset='cifar10', split='val', average=True)[1]
summary = api.query(arch, dataset='cifar10', split='val', summary=True)

Args:

arch: Arch101 — architecture object
dataset: str — ‘cifar10’ (only dataset available)
split: str — ‘train’, ‘val’, or ‘test’
seed: Optional[int] — unused
budget: Optional[Any] — unused (all budgets available in metrics)
average: Optional[bool] — return averaged metrics per budget when True
summary: Optional[bool] — return condensed dict (legacy shape) when True

Returns:

Tuple (info_dict, metrics_by_budget) when summary=False (default)
Condensed dict when summary=True

train_time¶

Get training time for an architecture.

time_sec = api.train_time(arch, dataset='cifar10')
# Returns: float (seconds) or None

mutate¶

Apply a mutation to an architecture.

import random
rng = random.Random(42)
mutated = api.mutate(arch, rng=rng, kind='edge_toggle')

Mutation kinds:

‘edge_toggle’ — flip an edge in the adjacency matrix
‘op_swap’ — swap two operations

NASBench-201 Specifics¶

Import and Initialization¶

from nasbenchapi import NASBench201

api = NASBench201('/path/to/NASBench-201-v1_1-096897.pth', verbose=True)
# Or use environment variable
api = NASBench201(verbose=True)

Dataset and Splits¶

Datasets: CIFAR-10, CIFAR-100, ImageNet16-120
Splits: train, val, test
Training epochs: 0-199 (200 epochs total)
Common budget values: 12 (early), 199 (final epoch)
Default seed: 777 (official NB201 seed)

Architecture Representation¶

NB201 uses architecture strings as the primary representation:

arch_str = '|none~0|+|skip_connect~0|nor_conv_1x1~1|+|nor_conv_3x3~0|avg_pool_3x3~1|skip_connect~2|'

Format details:

Cell with 4 nodes (node 0 is input, nodes 1-3 are intermediate, node 4 is output)
6 edges: (1←0), (2←0), (2←1), (3←0), (3←1), (3←2)
Each edge has one operation from: [‘none’, ‘skip_connect’, ‘nor_conv_1x1’, ‘nor_conv_3x3’, ‘avg_pool_3x3’]
String format: |op~src|+|op~src|op~src|+|op~src|op~src|op~src|

Index mapping:

Each architecture has a canonical integer index: 0 to 15,624
Use get_index(arch_str) to convert string → index

random_sample¶

arch_strs = api.random_sample(n=5, seed=42)
# Returns: ['|none~0|+|...', '|skip_connect~0|+|...', ...]

Returns: list[str] — architecture strings

iter_all¶

for arch_str in api.iter_all():
    idx = api.get_index(arch_str)
    print(f"Architecture {idx}: {arch_str}")

Returns: Iterator[str] — yields architecture strings

get_index¶

Convert an architecture string to its canonical integer index.

idx = api.get_index('|none~0|+|skip_connect~0|nor_conv_1x1~1|+|...')
# Returns: 12345 (int in range 0-15624)

Args:

arch: str — NB201 architecture string

Returns: int — index (0-15624)

Raises: ValueError if architecture string is invalid

query¶

result = api.query(
    arch='|none~0|+|skip_connect~0|nor_conv_1x1~1|+|...',
    dataset='cifar10',
    split='val',
    seed=777,      # Default seed
    budget=199     # Final epoch
)

Args:

arch: str — NB201 architecture string
dataset: str — ‘cifar10’, ‘cifar100’, or ‘ImageNet16-120’
split: str — ‘train’, ‘val’, or ‘test’
seed: Optional[int] — data seed (default: 777)
budget: Optional[int] — epoch number 0-199 (default: 199)

Returns: dict with keys:

metric: accuracy percentage (e.g., 91.23)
metric_name: ‘{split}_acc’
cost: training/eval time in seconds
std: None (not used)
info: dict with arch_index, dataset, split, seed, epoch, arch_str, params, flop

Split-specific behavior:

‘train’: Returns training accuracy at specified epoch
‘val’: Returns validation accuracy (uses ‘x-valid@epoch’ keys in data)
‘test’: Returns test accuracy (uses ‘ori-test@epoch’ keys, falls back to validation)

NASBench-301 Specifics¶

Import and Initialization¶

from nasbenchapi import NASBench301

api = NASBench301('/path/to/nb301_data.pkl', verbose=True)
# Or use environment variable
api = NASBench301(verbose=True)

Dataset and Splits¶

Datasets: CIFAR-10, CIFAR-100
Splits: val, test (no train split for surrogates)
Training epochs: Validation learning curves provide per-epoch accuracies; the test split reports metrics at the declared final budget for each entry.

Architecture Representation¶

NB301 uses DARTS-style architecture dictionaries:

arch = {
    'normal': [
        ('sep_conv_3x3', 0), ('sep_conv_3x3', 1),  # Node 1 inputs
        ('sep_conv_3x3', 0), ('sep_conv_3x3', 1),  # Node 2 inputs
        ('sep_conv_3x3', 1), ('skip_connect', 0),  # Node 3 inputs
        ('skip_connect', 0), ('dil_conv_3x3', 2)   # Node 4 inputs
    ],
    'reduce': [
        ('max_pool_3x3', 0), ('max_pool_3x3', 1),
        ('skip_connect', 2), ('max_pool_3x3', 0),
        ('max_pool_3x3', 0), ('skip_connect', 2),
        ('skip_connect', 2), ('max_pool_3x3', 1)
    ]
}

Format details:

Two cells: ‘normal’ and ‘reduce’ (reduction cell)
Each cell has 4 intermediate nodes
Each node selects 2 operations from previous nodes (including input nodes 0 and 1)
8 operations: [‘max_pool_3x3’, ‘avg_pool_3x3’, ‘skip_connect’, ‘sep_conv_3x3’, ‘sep_conv_5x5’, ‘dil_conv_3x3’, ‘dil_conv_5x5’, ‘none’]
Each entry is a tuple: (operation_name, predecessor_node_index)

random_sample¶

indices = api.random_sample(n=3, seed=42)
# Returns: [102, 4096, 7123]

Returns: list[int] — dataset entry indices (falls back to architecture dict samples if raw entries are unavailable)

iter_all¶

for idx in api.iter_all():
    print(f"Architecture index: {idx}")

Returns: Iterator[int] — yields indices in loaded data

get_index¶

Find the index of an architecture in loaded data.

idx = api.get_index(arch_dict)
# Returns: 42 (int) or None if not found

Args:

arch: Any — architecture dict, dataset index, or entry path string

Returns: Optional[int] — index in loaded data, or None if not found

query¶

result = api.query(
    arch=0,           # dataset index
    dataset='cifar10',
    split='val',
    budget=50,        # epoch index
)

Args:

arch: Any — dataset index (int), entry path (str), or architecture dict with ‘normal’/’reduce’ keys
dataset: str — ‘cifar10’ or ‘cifar100’
split: str — ‘val’ or ‘test’
seed: Optional[int] — unused
budget: Optional[int] — epoch index for validation curves (defaults to final epoch); ignored for test split

Returns: dict with keys: metric, metric_name, cost, std, info (runtime in seconds, dataset metadata, epochs available/used, declared budget, optimizer tag, and JSON path)

Split behavior:

val: accuracy from the per-entry validation learning curve; budgets beyond the recorded length fall back to the final epoch.
test: reported test accuracy at the declared final budget (the budget argument is ignored).

Complete Usage Examples¶

NASBench-101 Example¶

from nasbenchapi import NASBench101

# Initialize
api = NASBench101(verbose=True)
stats = api.get_statistics()
print(f"Loaded {stats['architectures']} architectures")

# Sample architectures
archs = api.random_sample(n=5, seed=42)

# Query performance
for arch in archs:
    result = api.query(arch, dataset='cifar10', split='test')
    print(f"Test accuracy: {result['metric']:.2f}%")
    print(f"Training time: {result['cost']:.2f}s")

NASBench-201 Example¶

from nasbenchapi import NASBench201

# Initialize
api = NASBench201(verbose=True)

# Sample architecture strings
arch_strs = api.random_sample(n=3, seed=777)

# Query on multiple datasets
for arch_str in arch_strs:
    idx = api.get_index(arch_str)
    print(f"\nArchitecture {idx}:")

    for dataset in ['cifar10', 'cifar100', 'ImageNet16-120']:
        result = api.query(
            arch=arch_str,
            dataset=dataset,
            split='test',
            seed=777,
            budget=199
        )
        print(f"  {dataset} test acc: {result['metric']:.2f}%")

# Iterate all architectures
count = 0
for arch_str in api.iter_all():
    count += 1
    if count > 5:
        break
    result = api.query(arch_str, dataset='cifar10', split='val')
    print(f"Arch {count}: val_acc = {result['metric']:.2f}%")

NASBench-301 Example¶

from nasbenchapi import NASBench301

# Initialize
api = NASBench301(verbose=True)

# Sample dataset indices
arch_indices = api.random_sample(n=2, seed=42)

# Query performance at multiple epochs
for idx in arch_indices:
    final_val = api.query(idx, dataset='cifar10', split='val')
    mid_val = api.query(idx, dataset='cifar10', split='val', budget=50)
    print(f"Index {idx}: final={final_val['metric']:.2f}% | mid@50={mid_val['metric']:.2f}%")

Error Handling¶

Common Exceptions¶

ValueError:

Invalid architecture string format (NB201)
Architecture index out of range
Invalid dataset or split name

FileNotFoundError:

Pickle file not found at specified path
Environment variable not set

KeyError:

Data format mismatch (e.g., missing expected keys in pickle)

Example Error Handling¶

from nasbenchapi import NASBench201

try:
    api = NASBench201('/path/to/data.pkl', verbose=True)
except FileNotFoundError:
    print("Data file not found. Please set NASBENCH201_PATH or provide valid path.")
    exit(1)

try:
    result = api.query(
        arch='|invalid~format|',
        dataset='cifar10',
        split='val'
    )
except ValueError as e:
    print(f"Invalid architecture: {e}")

Verbose Logging Control¶

All benchmarks support a verbose parameter to control logging output:

# Enable all logging (default)
api = NASBench201(verbose=True)
# Outputs:
# Loading NB201 from /path/to/file.pkl (2.1 GB)
# Reading: 100%|██████████| 2.1G/2.1G [00:15<00:00]
# Unpickling data...
# Unpickling complete.
# [NB201] Loaded 15625 architectures (source=arch2infos)

# Disable all logging (silent mode)
api = NASBench201(verbose=False)
# No output

Logging includes:

File loading progress bars (via tqdm)
Unpickling status messages
Data summary and statistics
Warning messages (e.g., mapping failures)