Plans capture the individual steps of operational tasks. A plan organizes these tasks into
steps. Each step references the tasks to run for this step. By organizing steps into phases, complex behavior of services can be captured.
Plans and phases have a
strategy indicates if
steps should run in parallel or in serial.
... tasks: - name: deploy-master kind: Apply spec: resources: - master-service.yaml - master.yaml - name: deploy-agent kind: Apply spec: resources: - agent-service.yaml - agent.yaml plans: deploy: strategy: parallel phases: - name: deploy-master strategy: serial steps: - name: deploy-master tasks: - deploy-master - name: deploy-agent strategy: serial steps: - name: deploy-agent tasks: - deploy-agent
The KUDO controller deploys the plans of an operator. Only a single plan can be in active deployment at a time. KUDO has different approaches to decide which plan to deploy.
# Deploy and update plans
By default, KUDO will start the
deploy plan if no other plan has been deployed yet. In case of an instance update, KUDO will start a
update plan if it exists. If none of these plans exist, KUDO will start the
In the example below, the
deploy plan creates a service defined in
service.yaml. In case of an update, the service's cache needs to be updated. The
update-cache.yaml provides the necessary resources to do that and thus is part of the
... tasks: - name: app kind: Apply spec: resources: - service.yaml - name: update kind: Apply spec: resources: - service.yaml - update-cache.yaml plans: deploy: strategy: serial phases: - name: deploy-service strategy: serial steps: - name: deploy tasks: - app update: strategy: serial phases: - name: update-service strategy: serial steps: - name: update tasks: - update
# Cleanup plans
If an optional
cleanup plan is part of an operator, this plan will run as part of the deletion of an Instance. Once this plan completes or fails, the instance will be deleted.
This plan (if exists) will be triggered by the KUDO manager automatically once the
Instance is being deleted. Trying to trigger it during the normal life-cycle of the
Instance will lead to an error. Furthermore, it should be expected that the steps of this plan could fail. E.g., users may want to delete an instance because its
deploy plan is stuck. In that case resources that the
cleanup plan tries to remove might not exist on the cluster. The
cleanup plan will start even if other plans are in progress.
... tasks: - name: database kind: Apply spec: resources: - database.yaml - name: cleanup kind: Apply spec: resources: - cleanup-job.yaml spec: plans: deploy: strategy: serial phases: - name: deploy-database strategy: serial steps: - name: deploy tasks: - database cleanup: strategy: serial phases: - name: cleanup-databse strategy: serial steps: - name: cleanup tasks: - cleanup
Note, that it is not necessary to remove the
database.yaml as all resources belonging to the
Instance will be removed automatically when the instance is deleted. However, complex application sometimes create state which cannot be captured by Kubernetes resources. As this state may have to be removed when removing the respective operator, a
cleanup plan can take care of that. The example above bundles that tasks needed to remove this state in a Job provided by the
cleanup-job.yaml. In general a
cleanup plan is like any other plan except that it is being called before the
Instance is deleted.
cleanup plan is implemented using finalizers. The instance's
metadata.finalizers contains the value "kudo.dev.instance.cleanup" while the
cleanup plan is in progress.
# Parameter triggers
Parameters can have optional triggers. A trigger references a plan that will run if the parameter is updated.
The operations required to update a running application can vary depending on which parameter is
being updated. For instance updating the
BROKER_COUNT, may require a simple update of the deployment, whereas
APPLICATION_MEMORY may require rolling out a new version via a canary or blue/green deployment.
... parameters: - name: REPLICAS trigger: deploy - name: APPLICATION_MEMORY trigger: canary
# Executing Plans
In general, KUDO manager will automatically execute a plan when the corresponding parameter changes. However, sometimes this is not enough. Sometimes you need to trigger a plan manually, e.g. to create a periodic
restore data in case of data corruption. Such plans typically do not need a corresponding parameter. KUDO v0.11.0 introduced new feature: manual plan execution. In a nutshell, you can now:
$ kubectl kudo plan trigger --name deploy --instance my-instance
which will trigger the
deploy plan and execute it on
my-instance. While this looks relatively easy on the surface, the devil is in detail, so let's take a closer look.
# Plan Life Cycle
Having the ability to trigger multiple plans on demand raises the question: what happens if two plans run concurrently? The answer is: it depends. Should two plans be completely independent of each other (e.g.
deploy deploys the services and
monitor deploys the monitoring pods) both can be executed in parallel. But if two plans in question are
migrate? Some plans are incompatible with others. A few may not even be reentrant. While it is probably ok to restart a running
deploy plan, a
restore plan might not be reentrant because of possible data corruption.
We're planning to explore the realm of plan compatibility further in the future. Annotating plan affinity and anti-affinity, reentrant vs non-reentrant plans, plan cancellation and transient plan parameters are some of the topics we're examining. All contributions and feedback are highly welcome.
Having all this in mind how can we ensure correct plan execution? Meet Kubernetes admission controllers.
# Admission Controllers
In a nutshell, Kubernetes admission controllers are plugins that govern and enforce how the cluster is used. They can be thought of as a gatekeeper that intercepts (authenticated) API requests and may change the request object or deny the request altogether. Kubernetes already comes with a bunch of these pre-installed which govern everything from user authorization to the namespace life cycle.
KUDO manager employs an Instance admission controller that governs changes to Instances, making sure that plans do not interfere. Schematically this looks like the following:
Instance admission controller governs any update to the Instance either through manual plan execution or Instance parameter updates. The general rule of thumb is the following: all plans should be terminal (either successfully with status
COMPLETE or unsuccessfully with
FATAL_ERROR) before a new plan is allowed to start. A singular plan can be restarted so we assume all plans to be reentrant at least for now. While this might not be true for all plans, we think that it covers the 80/20 case e.g. when a
deploy plan is stuck and must be restarted with less memory per node. In case a request is rejected, the Instance controller returns an error explaining why exactly the update was denied.
The admission controller would also reject parameter updates that would trigger multiple distinct plans. There are a few exceptions too: for example, a
cleanup plan is special and is executed when an Instance is deleted.
cleanup can not be triggered manually and is allowed to override any existing plan (since the Instance is being deleted anyway).
As of KUDO v0.11.0, the Instance admission controller is optional though we're planing to make it mandatory in the near future. See kudo init documentation for more details.