API Reference¶
This document provides comprehensive reference for NASBench-101/201/301 APIs, including architecture representations, method signatures, return types, and benchmark-specific details.
Benchmarks Overview¶
Benchmark |
Datasets |
Available Splits |
Primary Metrics |
Training Epochs |
|---|---|---|---|---|
NASBench-101 |
CIFAR-10 |
train, val, test |
train/val/test accuracy |
4, 12, 36, 108 |
NASBench-201 |
CIFAR-10, CIFAR-100, ImageNet16-120 |
train, val, test |
train/val/test accuracy, losses |
0-199 (200 epochs total) |
NASBench-301 |
CIFAR-10, CIFAR-100 |
val, test |
surrogate val/test accuracy |
N/A (surrogate-based) |
Architecture Representations¶
Each benchmark uses a different architecture representation:
NASBench-101 (Arch101):
Dataclass with two fields:
adjacency: list[list[int]] — 7×7 adjacency matrixoperations: list[str] — 7 operations from [‘input’, ‘conv3x3-bn-relu’, ‘conv1x1-bn-relu’, ‘maxpool3x3’, ‘output’]
Example:
Arch101(
adjacency=[[0, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0],
...],
operations=['input', 'conv3x3-bn-relu', 'conv1x1-bn-relu', ..., 'output']
)
NASBench-201 (String):
Architecture string format:
|op~0|+|op~0|op~1|+|op~0|op~1|op~2|6 edges connecting 4 nodes in a cell
5 operations per edge: [‘none’, ‘skip_connect’, ‘nor_conv_1x1’, ‘nor_conv_3x3’, ‘avg_pool_3x3’]
Total search space: 5^6 = 15,625 unique architectures
Each architecture maps to a canonical index (0-15624)
Example:
'|none~0|+|skip_connect~0|nor_conv_1x1~1|+|nor_conv_3x3~0|avg_pool_3x3~1|skip_connect~2|'
NASBench-301 (Dict):
DARTS-style architecture with normal and reduction cells
Dictionary with ‘normal’ and ‘reduce’ keys
Each cell: list of (operation, predecessor_node) tuples
8 operations: [‘max_pool_3x3’, ‘avg_pool_3x3’, ‘skip_connect’, ‘sep_conv_3x3’, ‘sep_conv_5x5’, ‘dil_conv_3x3’, ‘dil_conv_5x5’, ‘none’]
4 intermediate nodes per cell, each with 2 input edges
Example:
{
'normal': [('sep_conv_3x3', 0), ('sep_conv_3x3', 1), ('sep_conv_3x3', 0), ...],
'reduce': [('max_pool_3x3', 0), ('max_pool_3x3', 1), ('skip_connect', 2), ...]
}
Common API Surface¶
All benchmarks expose the following core methods.
Initialization¶
from nasbenchapi import NASBench101, NASBench201, NASBench301
# Using explicit path
api = NASBench201('/path/to/nb201.pkl', verbose=True)
# Using environment variable
api = NASBench201(verbose=True) # Reads from NASBENCH201_PATH
Constructor Args:
data_path: Optional[str] — path to pickled benchmark data; if None, reads from environment variableverbose: bool — enable/disable all logging output (default: True)
Environment Variables:
NASBENCH101_PATH— path to NB101 pickle fileNASBENCH201_PATH— path to NB201 pickle fileNASBENCH301_PATH— path to NB301 pickle file
get_statistics¶
Get statistics about the loaded benchmark data.
stats = api.get_statistics()
Returns: dict — benchmark statistics
Return Format by Benchmark:
NB101:
{'benchmark': 'nasbench101', 'architectures': int, 'records': int}NB201:
{'benchmark': 'nasbench201', 'entries': int}NB301:
{'benchmark': 'nasbench301', 'files': int}
random_sample¶
Sample random architectures from the benchmark search space.
samples = api.random_sample(n=5, seed=123)
Args:
n: int — number of samples (default: 1)seed: Optional[int] — RNG seed for reproducibility
Returns:
NB101: list[Arch101] — list of Arch101 dataclass objects
NB201: list[str] — list of architecture strings
NB301: list[int] — indices for entries in the loaded dataset (falls back to synthetic architecture dicts if raw entries are unavailable)
iter_all¶
Iterate over all available architectures in the loaded data.
for arch in api.iter_all():
result = api.query(arch, dataset='cifar10', split='val')
Returns:
NB101: Iterator[Arch101]
NB201: Iterator[str] — architecture strings
NB301: Iterator[int] — indices in loaded data
get_index¶
Get an identifier or index for an architecture.
# NB201: Convert arch string to numeric index
idx = api.get_index('|none~0|+|skip_connect~0|nor_conv_1x1~1|+|...')
# Returns: 12345 (int in range 0-15624)
# NB101: Get hash identifier
hash_id = api.get_index(arch_obj)
# Returns: 'a3f5b2...' (SHA256 hash string)
# NB301: Find index in loaded data
idx = api.get_index(arch_dict)
# Returns: 42 or None
Args:
arch: Architecture representation (type depends on benchmark)NB101: Arch101 object
NB201: str (architecture string)
NB301: dict (architecture dict)
Returns:
NB101: str — stable SHA256 hash identifier
NB201: int — canonical index (0-15624)
NB301: Optional[int] — index in loaded data, or None if not found
available_budgets¶
List available training budgets (epochs) for a dataset/split combination.
budgets = api.available_budgets(dataset='cifar10', split='val')
# Returns e.g. [199, 200] for NB201 validation
Args:
dataset: Optional[str] — target dataset (defaults to all datasets)split: Optional[str] — target split (defaults to all splits)
Returns: Optional[list] — sorted list of budgets if tracked; None when budgets are not defined for the benchmark.
NB101: returns
None(budgets not tracked)NB201: list of available epochs per dataset/split based on original training logs
NB301: epochs derived from per-entry learning curves (validation) or final declared budget (test)
exists¶
Validate whether a combination of dataset, split, budget, and architecture is supported without issuing a full query.
api.exists(dataset='cifar10', split='val', budget=199) # -> True
Args:
dataset: Optional[str]split: Optional[str]budget: Optional[Any]arch: Optional[Any] — architecture representation
Returns: bool — True if every provided component is supported, False otherwise.
query¶
Query performance metrics for an architecture from loaded data.
# NB201 example
result = api.query(
arch='|none~0|+|skip_connect~0|nor_conv_1x1~1|+|...',
dataset='cifar10',
split='val',
seed=777,
budget=199
)
print(f"Validation accuracy: {result['metric']:.2f}%")
print(f"Training time: {result['cost']:.2f}s")
Args:
arch: Architecture representation (depends on benchmark)NB101: Arch101 object
NB201: str (architecture string)
NB301: Any (dict or index)
dataset: str — dataset nameNB101: ‘cifar10’
NB201: ‘cifar10’, ‘cifar100’, ‘ImageNet16-120’
NB301: ‘cifar10’, ‘cifar100’
split: str — data splitNB101: ‘train’, ‘val’, ‘test’
NB201: ‘train’, ‘val’, ‘test’
NB301: ‘val’, ‘test’
seed: Optional[int] — random seed (default varies by benchmark)NB201: default 777 (official NB201 seed)
NB101/NB301: unused
budget: Optional[Any] — training budgetNB101: unused (returns final recorded metrics)
NB201: epoch number 0-199 (default: 199 for final epoch)
NB301: epoch index for validation curves (defaults to final epoch); test split always reports the declared final budget
Returns: dict with the following keys:
{
'metric': Optional[float], # Primary metric (e.g., accuracy %)
'metric_name': str, # Name of metric (e.g., 'val_acc')
'cost': Optional[float], # Training time in seconds
'std': Optional[float], # Standard deviation (if available)
'info': dict # Additional metadata and raw data
}
Return Value Details:
NB101: Returns a tuple
(info_dict, metrics_by_budget)by default.info_dictcontainsmodule_adjacency,module_operations,module_hash, and aggregate training metadata.metrics_by_budgetis a dict keyed by epoch budgets (4/12/36/108), where each value is a list of up to three run dictionaries. Each run dictionary mirrors the native NASBench metrics:halfway_*andfinal_*keys as well as training times.average=Truecollapses each budget to a single averaged metrics dictionary.summary=Truerestores the condensed dict (metric,metric_name,cost,std,info) for backwards compatibility.
NB201 / NB301: Return a dictionary with keys:
metric: Accuracy percentage (e.g., 94.5) or None if not availablemetric_name: Describes the metric, typically{split}_acccost: Training/evaluation time in seconds, or Nonestd: Standard deviation of the metric across multiple runs (rarely used)info: Dictionary containing additional informationNB201: arch_index, dataset, split, seed, epoch, arch_str, params, flop
NB301: Entry metadata (index, dataset, epochs available/used, declared budget, optimizer tag, JSON path)
NASBench-101 Specifics¶
Import and Initialization¶
from nasbenchapi import NASBench101
api = NASBench101('/path/to/nasbench_only108.pkl', verbose=True)
# Or use environment variable
api = NASBench101(verbose=True)
Dataset and Splits¶
Single dataset: CIFAR-10 only
Splits: train, val, test
Training epochs: 4, 12, 36, 108 (typically query final epoch 108)
Architecture Type (Arch101)¶
from nasbenchapi import Arch101
arch = Arch101(
adjacency=[[0, 1, 1, 0, 0, 0, 0], ...], # 7×7 matrix
operations=['input', 'conv3x3-bn-relu', ..., 'output'] # 7 ops
)
Operations¶
Available operations (from op_set()):
‘input’ (fixed at node 0)
‘conv3x3-bn-relu’
‘conv1x1-bn-relu’
‘maxpool3x3’
‘output’ (fixed at node 6)
encode / decode / id¶
# Encode Arch101 to native strings
encoding = api.encode(arch)
# Returns: {'adjacency_str': '0110000...', 'operations_str': 'input,conv3x3-bn-relu,...'}
# Decode encoding to Arch101
arch = api.decode(encoding)
# Get stable hash ID
arch_id = api.id(arch)
# Returns: 'a3f5b2c8...' (SHA256 hash)
get_index¶
# Returns the same as id() for consistency
hash_id = api.get_index(arch)
# Returns: 'a3f5b2c8...'
random_sample¶
archs = api.random_sample(n=10, seed=42)
# Returns: list of 10 Arch101 objects sampled from loaded data
iter_all¶
for arch in api.iter_all():
result = api.query(arch, dataset='cifar10', split='test')
print(f"Test acc: {result['metric']:.2f}%")
query¶
info, metrics = api.query(arch, dataset='cifar10', split='val')
# metrics -> {4: [run_dict, ...], 12: [...], 36: [...], 108: [...]}
averaged = api.query(arch, dataset='cifar10', split='val', average=True)[1]
summary = api.query(arch, dataset='cifar10', split='val', summary=True)
Args:
arch: Arch101 — architecture objectdataset: str — ‘cifar10’ (only dataset available)split: str — ‘train’, ‘val’, or ‘test’seed: Optional[int] — unusedbudget: Optional[Any] — unused (all budgets available inmetrics)average: Optional[bool] — return averaged metrics per budget when Truesummary: Optional[bool] — return condensed dict (legacy shape) when True
Returns:
Tuple
(info_dict, metrics_by_budget)whensummary=False(default)Condensed dict when
summary=True
train_time¶
Get training time for an architecture.
time_sec = api.train_time(arch, dataset='cifar10')
# Returns: float (seconds) or None
mutate¶
Apply a mutation to an architecture.
import random
rng = random.Random(42)
mutated = api.mutate(arch, rng=rng, kind='edge_toggle')
Mutation kinds:
‘edge_toggle’ — flip an edge in the adjacency matrix
‘op_swap’ — swap two operations
NASBench-201 Specifics¶
Import and Initialization¶
from nasbenchapi import NASBench201
api = NASBench201('/path/to/NASBench-201-v1_1-096897.pth', verbose=True)
# Or use environment variable
api = NASBench201(verbose=True)
Dataset and Splits¶
Datasets: CIFAR-10, CIFAR-100, ImageNet16-120
Splits: train, val, test
Training epochs: 0-199 (200 epochs total)
Common budget values: 12 (early), 199 (final epoch)
Default seed: 777 (official NB201 seed)
Architecture Representation¶
NB201 uses architecture strings as the primary representation:
arch_str = '|none~0|+|skip_connect~0|nor_conv_1x1~1|+|nor_conv_3x3~0|avg_pool_3x3~1|skip_connect~2|'
Format details:
Cell with 4 nodes (node 0 is input, nodes 1-3 are intermediate, node 4 is output)
6 edges: (1←0), (2←0), (2←1), (3←0), (3←1), (3←2)
Each edge has one operation from: [‘none’, ‘skip_connect’, ‘nor_conv_1x1’, ‘nor_conv_3x3’, ‘avg_pool_3x3’]
String format:
|op~src|+|op~src|op~src|+|op~src|op~src|op~src|
Index mapping:
Each architecture has a canonical integer index: 0 to 15,624
Use
get_index(arch_str)to convert string → index
random_sample¶
arch_strs = api.random_sample(n=5, seed=42)
# Returns: ['|none~0|+|...', '|skip_connect~0|+|...', ...]
Returns: list[str] — architecture strings
iter_all¶
for arch_str in api.iter_all():
idx = api.get_index(arch_str)
print(f"Architecture {idx}: {arch_str}")
Returns: Iterator[str] — yields architecture strings
get_index¶
Convert an architecture string to its canonical integer index.
idx = api.get_index('|none~0|+|skip_connect~0|nor_conv_1x1~1|+|...')
# Returns: 12345 (int in range 0-15624)
Args:
arch: str — NB201 architecture string
Returns: int — index (0-15624)
Raises: ValueError if architecture string is invalid
query¶
result = api.query(
arch='|none~0|+|skip_connect~0|nor_conv_1x1~1|+|...',
dataset='cifar10',
split='val',
seed=777, # Default seed
budget=199 # Final epoch
)
Args:
arch: str — NB201 architecture stringdataset: str — ‘cifar10’, ‘cifar100’, or ‘ImageNet16-120’split: str — ‘train’, ‘val’, or ‘test’seed: Optional[int] — data seed (default: 777)budget: Optional[int] — epoch number 0-199 (default: 199)
Returns: dict with keys:
metric: accuracy percentage (e.g., 91.23)metric_name: ‘{split}_acc’cost: training/eval time in secondsstd: None (not used)info: dict with arch_index, dataset, split, seed, epoch, arch_str, params, flop
Split-specific behavior:
‘train’: Returns training accuracy at specified epoch
‘val’: Returns validation accuracy (uses ‘x-valid@epoch’ keys in data)
‘test’: Returns test accuracy (uses ‘ori-test@epoch’ keys, falls back to validation)
NASBench-301 Specifics¶
Import and Initialization¶
from nasbenchapi import NASBench301
api = NASBench301('/path/to/nb301_data.pkl', verbose=True)
# Or use environment variable
api = NASBench301(verbose=True)
Dataset and Splits¶
Datasets: CIFAR-10, CIFAR-100
Splits: val, test (no train split for surrogates)
Training epochs: Validation learning curves provide per-epoch accuracies; the test split reports metrics at the declared final budget for each entry.
Architecture Representation¶
NB301 uses DARTS-style architecture dictionaries:
arch = {
'normal': [
('sep_conv_3x3', 0), ('sep_conv_3x3', 1), # Node 1 inputs
('sep_conv_3x3', 0), ('sep_conv_3x3', 1), # Node 2 inputs
('sep_conv_3x3', 1), ('skip_connect', 0), # Node 3 inputs
('skip_connect', 0), ('dil_conv_3x3', 2) # Node 4 inputs
],
'reduce': [
('max_pool_3x3', 0), ('max_pool_3x3', 1),
('skip_connect', 2), ('max_pool_3x3', 0),
('max_pool_3x3', 0), ('skip_connect', 2),
('skip_connect', 2), ('max_pool_3x3', 1)
]
}
Format details:
Two cells: ‘normal’ and ‘reduce’ (reduction cell)
Each cell has 4 intermediate nodes
Each node selects 2 operations from previous nodes (including input nodes 0 and 1)
8 operations: [‘max_pool_3x3’, ‘avg_pool_3x3’, ‘skip_connect’, ‘sep_conv_3x3’, ‘sep_conv_5x5’, ‘dil_conv_3x3’, ‘dil_conv_5x5’, ‘none’]
Each entry is a tuple: (operation_name, predecessor_node_index)
random_sample¶
indices = api.random_sample(n=3, seed=42)
# Returns: [102, 4096, 7123]
Returns: list[int] — dataset entry indices (falls back to architecture dict samples if raw entries are unavailable)
iter_all¶
for idx in api.iter_all():
print(f"Architecture index: {idx}")
Returns: Iterator[int] — yields indices in loaded data
get_index¶
Find the index of an architecture in loaded data.
idx = api.get_index(arch_dict)
# Returns: 42 (int) or None if not found
Args:
arch: Any — architecture dict, dataset index, or entry path string
Returns: Optional[int] — index in loaded data, or None if not found
query¶
result = api.query(
arch=0, # dataset index
dataset='cifar10',
split='val',
budget=50, # epoch index
)
Args:
arch: Any — dataset index (int), entry path (str), or architecture dict with ‘normal’/’reduce’ keysdataset: str — ‘cifar10’ or ‘cifar100’split: str — ‘val’ or ‘test’seed: Optional[int] — unusedbudget: Optional[int] — epoch index for validation curves (defaults to final epoch); ignored for test split
Returns: dict with keys: metric, metric_name, cost, std, info (runtime in seconds, dataset metadata, epochs available/used, declared budget, optimizer tag, and JSON path)
Split behavior:
val: accuracy from the per-entry validation learning curve; budgets beyond the recorded length fall back to the final epoch.test: reported test accuracy at the declared final budget (thebudgetargument is ignored).
Complete Usage Examples¶
NASBench-101 Example¶
from nasbenchapi import NASBench101
# Initialize
api = NASBench101(verbose=True)
stats = api.get_statistics()
print(f"Loaded {stats['architectures']} architectures")
# Sample architectures
archs = api.random_sample(n=5, seed=42)
# Query performance
for arch in archs:
result = api.query(arch, dataset='cifar10', split='test')
print(f"Test accuracy: {result['metric']:.2f}%")
print(f"Training time: {result['cost']:.2f}s")
NASBench-201 Example¶
from nasbenchapi import NASBench201
# Initialize
api = NASBench201(verbose=True)
# Sample architecture strings
arch_strs = api.random_sample(n=3, seed=777)
# Query on multiple datasets
for arch_str in arch_strs:
idx = api.get_index(arch_str)
print(f"\nArchitecture {idx}:")
for dataset in ['cifar10', 'cifar100', 'ImageNet16-120']:
result = api.query(
arch=arch_str,
dataset=dataset,
split='test',
seed=777,
budget=199
)
print(f" {dataset} test acc: {result['metric']:.2f}%")
# Iterate all architectures
count = 0
for arch_str in api.iter_all():
count += 1
if count > 5:
break
result = api.query(arch_str, dataset='cifar10', split='val')
print(f"Arch {count}: val_acc = {result['metric']:.2f}%")
NASBench-301 Example¶
from nasbenchapi import NASBench301
# Initialize
api = NASBench301(verbose=True)
# Sample dataset indices
arch_indices = api.random_sample(n=2, seed=42)
# Query performance at multiple epochs
for idx in arch_indices:
final_val = api.query(idx, dataset='cifar10', split='val')
mid_val = api.query(idx, dataset='cifar10', split='val', budget=50)
print(f"Index {idx}: final={final_val['metric']:.2f}% | mid@50={mid_val['metric']:.2f}%")
Error Handling¶
Common Exceptions¶
ValueError:
Invalid architecture string format (NB201)
Architecture index out of range
Invalid dataset or split name
FileNotFoundError:
Pickle file not found at specified path
Environment variable not set
KeyError:
Data format mismatch (e.g., missing expected keys in pickle)
Example Error Handling¶
from nasbenchapi import NASBench201
try:
api = NASBench201('/path/to/data.pkl', verbose=True)
except FileNotFoundError:
print("Data file not found. Please set NASBENCH201_PATH or provide valid path.")
exit(1)
try:
result = api.query(
arch='|invalid~format|',
dataset='cifar10',
split='val'
)
except ValueError as e:
print(f"Invalid architecture: {e}")
Verbose Logging Control¶
All benchmarks support a verbose parameter to control logging output:
# Enable all logging (default)
api = NASBench201(verbose=True)
# Outputs:
# Loading NB201 from /path/to/file.pkl (2.1 GB)
# Reading: 100%|██████████| 2.1G/2.1G [00:15<00:00]
# Unpickling data...
# Unpickling complete.
# [NB201] Loaded 15625 architectures (source=arch2infos)
# Disable all logging (silent mode)
api = NASBench201(verbose=False)
# No output
Logging includes:
File loading progress bars (via tqdm)
Unpickling status messages
Data summary and statistics
Warning messages (e.g., mapping failures)