If you can't add engineers,

Build Engineering Assistants

100x faster than traditional automation. 1/5th the budget of low-cost outsourcing. Build an Engineering Assistant in minutes to handle work in dev, test and prod that you would automate if you had time.

Need Automation... Now?

Sync a Kubernetes, GCP, AWS, Azure account to start the discovery process. Get thousands of AI-ready, automated tasks from our experts in an hour covering cloud infrastructure, Kubernetes, popular OSS packages and programming frameworks. Wrap CLI, SQL, REST, Ansible, python, etc. to add your own.

Workflows -> Goals

Give your Engineering Assistant a goal. It runs tasks to make progress towards that goal - troubleshooting for developers, triaging alerts, broad diagnostics, collecting app/platform/infra info from different systems... even basic remediation and finops like adjusting CPU/Memory/Storage.

Executive Insights

Built-in dashboards for operational readiness in test and staging environments, SLO and error budget burn down charts, automation coverage and time saved, etc. -- the insight you need to run a high end automation program.

Check logs for errors -> if errors then collect stack traces and env vars -> paste to ticket -> if deployment then do a rolling restart. If a CI/CD job has an error then find the tests that it referenced -> find the deployments referenced by the test -> collect env vars, manifest and stack traces from deployment -> if existing ticket then paste to ticket -> if no ticket then create a new ticket. Untangling the test environment again. Collect logs -> -> grep for stack traces -> file ticket -> restart VM (AWS). Check for high error rate nginx paths -> find deployment -> check deployment resource health -> copy logs to a ticket -> restart deployment. Health check /login for http 200s -> check auth microservice logs for errors. Check postgres write-ahead log storage utilization -> add emergency capacity and escalate immediately. Helping developers with repetitive troubleshooting. Check Kubernetes Error events for Deployment -> check logs for application errors -> check CPU/mem/IO metrics -> check node for noisy neighbors -> paste all info into a ticket. Check databricks for failing job references -> if databricks job failed then check node health under Deployment -> if node health is OK then check databricks dependent deployment for Error Events. If developer says service is down -> help the developer run liveness probe check and collect recently logs and notify of find the service owner. Collect env status and pod logs -> paste into new ticket -> rolling restart deployment. Triage noisy alerts. Run test env pre-flight check -> check all Deployments are in ready state -> check transaction table has at least 1 row. Check transaction queue is <100 items deep -> if not, collect env info, deployment logs and file a ticket. Collect StatefulSet manifest -> paste to ticket. Manual health checks. Check certificate is valid -> if not, rotate certificate. Check Ingress for Warning Events. Check Ingress log for error messages. Increase cpu/memory capacity for Azure Web App. Check Ingress for paths with high rates of 500 errors. Read/write test key to Redis. Read test row from postgres -> restart VM if query returns no rows. Check kafka client latency. Restart kafka client to rebalance. Search logs and paste results to ticket. Add env vars to ServiceNow ticket. Confirm no root account logins in last 30 days. Check volume for utilization. Add emergency 10Gi storage capacity to volume. Add emergency 500 millicores CPU capacity. Compare deployment manifest to Vertical Pod Autoscaler CPU/Mem recommendations -> if misaligned then prepare manifest to align them in a PR -> email service owner. Check manifest for readiness probe configurations -> if missing then notify service owner -> if incorrect then prepare a PR with a fix and file a ticket. Check manifest for non-standard open ports -> if non-standard ports then check exception list -> if not on exception list then file a ServiceNow ticket. Check oauth login latency -> if latency is slow then restart VM -> email service owner. Check queue is less than 60% of capacity -> if queue is beyond basic capacity then check CPU / memory -> if CPU/memory is high then copy recent logs to a ticket and emergency restart process. If test body mentions vault error then do test read/write in vault test path with pod credentials -> if vault test read/write fails then try with default read only credentials -> if that fails then notify service owner. Check test DB is running and volume is not full and login string matches and no key tables locked and test user is entered in user table -> if any fail, stop running tests and notify test owner. Check CPU is not >80% for the last 5 minutes. Check memory is not >80% for the last 5 minutes. If resource utilization is over limits, open a PR for capacity increase. Check Azure metrics for http 500 rate overnight. Check logs for errors after deployment scale-up. Check for high CPU/mem after deployment scale-to-one.Run test env pre-flight check that all Deployments are in ready state and transact

Goals instead of workflows

Traditional automation tries to replicate an expert's workflow exactly. The results are brittle. These workflows are typically only 3-5 steps saving 10-20 minutes.

Engineering Assistants respond to a goal by running AI-ready "tasks" from our libraries and yours. They typically run 30-50 tasks as they make progress, saving 2-3 hours.

They build up reports as they go, and escalate when needed.

blue dot grid

Less context switching with VSCode integration

Give Engineering Assistants to your developers where they work most.

They can run a single AI-ready task directly, ask Assistants to run longer sessions or escalate to an expert directly from VSCode.

blue dot grid
half rings
3,432
Automated Tasks
In The Library
46,124
Tasks Suggested To Devs, Platform Engineers and SREs
15,562
Engineering Hours Saved In Dev, Test and Production

Did you say about an hour?

Our goal is to provide 98%+ of the automated tasks your Engineering Assistants need.

When you sync a cloud account or Kubernetes cluster, RunWhen matches resources you have with automation libraries you trust.

Teams typically import several thousand AI-ready tasks in the first hour to get started, and augment these over time with tasks wrapping their existing CLI commands, bash scripts, python, SQL queries, Ansible, etc.

blue dot grid

best-in-class engineering experience

More powerful than giving everyone dashboards. More secure than giving everyone credentials. More than just developer experience, create a great engineering experience across Dev, QA, DevOps, Platform, SRE, ...

half rings

Collaboration increases coverage

Our platform is designed for you to import AI-ready tasks from our community, but also for anyone across your teams to add their own. A CLI command? A SQL query? A REST call, or a shell script? Engineering Assistants (with appropriate access) recommend them and use them in real time, extending their capabilities without ever changing configuration.

Community member photoCommunity member photo
small dotted grid
arrow pointing right
arrow pointing right
small dotted grid
Eager Edgar profile pictureCautious Cathy profile pictureVivacious Venkat profile picture
half rings

Interactive demos in our sandbox

Want to try an Assistant in our sandbox? We have a Kubernetes cluster loaded with applications so you can see what they do.

Where to next?

The default Assistants that come out of the box are designed for Platform/SRE teams to give to developers for Kubernetes troubleshooting. However, it doesn't stop there...

Connect To Slack

Connect AI Assistants to Slack so anyone on the team can ask an AI Digital Assistant for root cause or remediation help 24/7.

Connect To Alerts

Connect Digital Assistants to alerts so they can run autonomous troubleshooting sessions and report back with a root cause, severity, suggested next steps and a full diagnostic report with output from all automation they ran.

Add No-Code Steps

Add No-Code "Generics," simple application troubleshooting steps like checking a REST API, a SQL query or pre-canned log search. These require only a few lines of configuration to be Digital Assistant-ready

Connect To CI/CD Pipelines

Connect to CI/CD pipelines and use Digital Assistants to run thousands of troubleshooting tasks. They report back on issues found and severity, creating metrics for operational readiness.

Distribute The VSCode Plugin

When you are ready to give your developers the gift of self-serve troubleshooting, consider distributing our VSCode plugin.

Chaos Engineering?

Connect to your chaos engineering stack or use our lightweight fault-injection scripts to see how Digital Assistants respond to incidents in staging before going to production

Manage to SLOs

RunWhen's defaults include automation to generate fine-grained SLIs, SLOs and Monthly Error Budgets based on community benchmarks that are useful in dev, staging and production.

half rings

Integrate with your existing tools

Our community has contributed integrations with numerous tools and in addition to troubleshooting applications written on popular code frameworks, platform components and cloud infrastructure.

Running a lean team means you need the best engineers you can find...

Do you really want them spending time on work that you can offload to AI? Some teams are using us to replace low-value, bloated outsourced operation teams with high value, in-house experts. Others are building, augmenting or replacing their Internal Developer Portals with an AI-first strategy.

image showing the impact of driving down kubernetes costs

Ready to get started?

Our private beta is ready for you - Let’s take your team to the next level.

Cautious Cathy profile pictureVivacious Venkat profile pictureEager Edgar profile picture