CloudThinker Glossary

Definitions for key concepts in CloudThinker and AI-powered cloud operations.

VibeOps

VibeOps is CloudThinker's philosophy and operational model for AI-driven cloud operations. It represents a closed-loop, self-healing operations architecture where human experts encode their knowledge as reusable automation primitives (Skills), and AI agents execute, learn, and improve from every resolution. VibeOps goes beyond traditional AIOps — which focuses on alert correlation and anomaly detection — by enabling autonomous end-to-end operations across the full software development and delivery lifecycle. The term reflects the shift from reactive, human-driven operations to a proactive, AI-native operating model where engineers set the rules and AI agents run the work.

AIOps

AIOps (Artificial Intelligence for IT Operations) refers to the use of machine learning and data analytics to automate and enhance IT operations. Traditional AIOps platforms focus primarily on log analysis, metric correlation, and anomaly detection — surfacing insights for human operators to act on. CloudThinker extends the AIOps model with autonomous AI agents that don't just detect and alert, but investigate root cause, execute remediation runbooks, and resolve incidents end-to-end. CloudThinker's approach is sometimes called VibeOps to distinguish fully autonomous operations from traditional AIOps tooling.

Graduated Autonomy

Graduated Autonomy is CloudThinker's four-level framework for incrementally expanding what AI agents are allowed to do independently. Level 1 (Notify): agents surface findings and recommendations for human review, taking no automated actions. Level 2 (Suggest): agents draft remediation plans and propose specific actions for engineer approval. Level 3 (Act with Approval): agents execute actions after an authorized team member approves the proposed change. Level 4 (Autonomous): agents operate fully independently within defined guardrails, with a complete audit trail of every decision and outcome. Teams configure autonomy levels per agent, per environment, and per team role using RBAC-gated governance.

Skills Framework

The CloudThinker Skills Framework is the modular system that defines what AI agents can do and how they do it. A Skill is an instruction set given to an AI agent once — encoding your team's processes, output formats, trigger conditions, and guardrails. Skills are organized in three categories: Public Skills (built-in universal capabilities like document processing), Connection Skills (purpose-built integrations for AWS, GitHub, Datadog, Jira, Slack, and 50+ other tools), and Custom Workspace Skills (team-specific skills encoding incident runbooks, code review standards, reporting formats, and any other repeatable process). The /create-skill command lets teams build new skills in natural language.

Sandbox Isolation

CloudThinker's Sandbox Isolation is a three-tier security model that ensures every AI agent action is safe before execution. Tier 1 (Organization sandbox): isolates agent workloads at the account level. Tier 2 (Workspace sandbox): scopes actions to specific team environments. Tier 3 (Ephemeral sandbox): creates short-lived, isolated microVMs for executing potentially destructive operations like infrastructure changes or script execution. Every ephemeral sandbox is destroyed after use and produces a complete audit trail. This model allows teams to run fully autonomous operations without risk of unintended side effects.

Dynamic Topology

Dynamic Topology is CloudThinker's real-time infrastructure dependency mapping capability. Using the /create-topo command, CloudThinker agents automatically discover and map the relationships between services, databases, APIs, queues, and infrastructure components across your multi-cloud environment. The resulting topology graph is continuously updated as your architecture changes, and is used by the Incident Response Agent to understand blast radius, identify cascading failure paths, and prioritize remediation steps based on service criticality.

MemGraph

MemGraph is CloudThinker's agent memory architecture that combines short-term working memory, long-term episodic memory, semantic retrieval, and file memory into a unified knowledge graph. When an AI agent resolves an incident, the resolution — including root cause, steps taken, and outcome — is stored in MemGraph. Future agents can retrieve this context to resolve similar incidents faster, avoid previously failed approaches, and surface recurring patterns for permanent engineering fixes. MemGraph makes CloudThinker's agents continuously smarter over time.

Runbook

In CloudThinker, a Runbook is a structured, executable operations playbook that defines the steps an AI agent takes to diagnose and resolve a specific class of infrastructure issue. CloudThinker ships with 325+ pre-built runbooks covering common failure modes across AWS, Kubernetes, databases, networking, and application services. Runbooks can be triggered manually, on a schedule, or automatically by the Incident Response Agent when a matching alert pattern is detected. Custom runbooks can be created with the /create-task command using natural language descriptions.