Skip to main content

Cost Safety & Troubleshooting

Cost safety

Billing starts on launch (POST /instances) and stops only on delete/terminate/destroy. The SDK is designed around that fact:

  • Launch is never auto-retried. A transient 5xx on launch is surfaced to you rather than retried — retrying a launch is how you end up paying for two instances.
  • apply requires a safety net. A manifest must include budget_limit_usd (or you must pass --no-safety-net / require_safety_net=False). The budget is an audit tag only — it does not auto-terminate an instance. Billing stops when you run destroy / terminate.
  • The CLI shows estimated daily/weekly spend and asks you to confirm before every interactive instance launch.
  • If wait_until_active times out, it raises WaitTimeoutError reporting the instance id, the timeout, and the last observed status. The instance may still be running and billing — tear it down with substratecloud instance terminate <id-or-name>.
  • Always tear down when finished: substratecloud destroy <name> or substratecloud instance terminate <id-or-name> (-y skips the confirm prompt).

Client-side spend reporting:

substratecloud cost --tag team:platform

Billing and wallet funding for the platform itself are covered in Billing & Wallet.

Errors

All SDK exceptions subclass SubstrateCloudError:

ExceptionMeaning
AuthErrorBad/expired token or wrong base URL. Re-run config init or check.
NoCapacityErrorNo inventory matched the request. Relax --gpu / --max-price / region.
NotFoundErrorInstance / SSH key / name not found.
ValidationErrorBad request payload (e.g. malformed manifest).
QuotaErrorOrg limit hit (e.g. the 3-token cap).
ServerError5xx from the API. Launches are not auto-retried — see above.
TransportErrorNetwork/connection failure.
WaitTimeoutErrorInstance didn't reach active before the timeout (it may still be billing).
WorkloadTimeoutErrorWorkload didn't become healthy in time.
from substratecloud import SubstrateCloud, AuthError, NoCapacityError

try:
client = SubstrateCloud()
item = client.inventory.find_cheapest(gpu_type="H100", max_price=3.0)
except AuthError:
print("Run `substratecloud config init` or check your token.")
except NoCapacityError:
print("No H100 under that price — try a higher --max-price or another region.")

Common setup issues

SymptomFix
substratecloud: command not foundAdd ~/.local/bin to your PATH.
python3-venv is missingsudo apt install -y python3-venv.
Token doesn't start with mcp_Copy an MCP key from Resources → MCP Keys, not another credential.
Auth check failsVerify the API base URL matches your org's ondemand-mcp-manager endpoint.

Known issues (alpha)

  • Currency symbol is inconsistent across commands. inventory and instance commands display the per-hour price with , while show-gpus, cost, and budget_limit_usd use $/USD. The underlying number is the same; only the symbol differs. Tracked for a future release.
  • Boot-script launch_configuration shape is not finalized. Boot-script and non-Docker workloads are previews and may change. workload render for boot-script YAML is intentionally disabled for now — compose boot scripts in Python instead.
  • No webhooks/streaming yet. Status is polling-only (wait_until_active).

Further reading