Cost Safety & Troubleshooting
Cost safety
Billing starts on launch (POST /instances) and stops only on
delete/terminate/destroy. The SDK is designed around that fact:
- Launch is never auto-retried. A transient 5xx on launch is surfaced to you rather than retried — retrying a launch is how you end up paying for two instances.
applyrequires a safety net. A manifest must includebudget_limit_usd(or you must pass--no-safety-net/require_safety_net=False). The budget is an audit tag only — it does not auto-terminate an instance. Billing stops when you rundestroy/terminate.- The CLI shows estimated daily/weekly spend and asks you to confirm before
every interactive
instance launch. - If
wait_until_activetimes out, it raisesWaitTimeoutErrorreporting the instance id, the timeout, and the last observed status. The instance may still be running and billing — tear it down withsubstratecloud instance terminate <id-or-name>. - Always tear down when finished:
substratecloud destroy <name>orsubstratecloud instance terminate <id-or-name>(-yskips the confirm prompt).
Client-side spend reporting:
substratecloud cost --tag team:platform
Billing and wallet funding for the platform itself are covered in Billing & Wallet.
Errors
All SDK exceptions subclass SubstrateCloudError:
| Exception | Meaning |
|---|---|
AuthError | Bad/expired token or wrong base URL. Re-run config init or check. |
NoCapacityError | No inventory matched the request. Relax --gpu / --max-price / region. |
NotFoundError | Instance / SSH key / name not found. |
ValidationError | Bad request payload (e.g. malformed manifest). |
QuotaError | Org limit hit (e.g. the 3-token cap). |
ServerError | 5xx from the API. Launches are not auto-retried — see above. |
TransportError | Network/connection failure. |
WaitTimeoutError | Instance didn't reach active before the timeout (it may still be billing). |
WorkloadTimeoutError | Workload didn't become healthy in time. |
from substratecloud import SubstrateCloud, AuthError, NoCapacityError
try:
client = SubstrateCloud()
item = client.inventory.find_cheapest(gpu_type="H100", max_price=3.0)
except AuthError:
print("Run `substratecloud config init` or check your token.")
except NoCapacityError:
print("No H100 under that price — try a higher --max-price or another region.")
Common setup issues
| Symptom | Fix |
|---|---|
substratecloud: command not found | Add ~/.local/bin to your PATH. |
python3-venv is missing | sudo apt install -y python3-venv. |
Token doesn't start with mcp_ | Copy an MCP key from Resources → MCP Keys, not another credential. |
| Auth check fails | Verify the API base URL matches your org's ondemand-mcp-manager endpoint. |
Known issues (alpha)
- Currency symbol is inconsistent across commands.
inventoryandinstancecommands display the per-hour price with€, whileshow-gpus,cost, andbudget_limit_usduse$/USD. The underlying number is the same; only the symbol differs. Tracked for a future release. - Boot-script
launch_configurationshape is not finalized. Boot-script and non-Docker workloads are previews and may change.workload renderfor boot-script YAML is intentionally disabled for now — compose boot scripts in Python instead. - No webhooks/streaming yet. Status is polling-only (
wait_until_active).
Further reading
- Overview & Install
- On-Demand MCP — the API the SDK talks to.
- SDK source & issues: substrate-cloud/probable-dollop.