# Advanced configuration options

# Spark Operator high-availability (HA) mode

Spark Operator supports high-availability (HA) deployment mode when more than one replica of operator pod is deployed and leader election (opens new window) is enabled.

In this mode, only one replica (leader) of operator deployment is actively operating (handling updates of resources with kind SparkApplication which represent Spark jobs submitted to a cluster), while other replicas are idle.

A leader is selected via the leader election process which initiated at operator’s startup and continuously runs in the background. During that process, all candidates are racing to acquire a lock resource. A candidate first possessed the lock becomes a leader, which then continually send a “heartbeat” requests to renew its position as the leader, and the other candidates periodically make new attempts to become the leader.

Note: the leader election process happens within a single installation of KUDO Spark Operator amongst controllers/pods belonging to this KUDO instance, not across all operators installed.

In case the leader replica fails (due to node failure, for example), the leader election process is initiated to select the new leader from available replicas.

Leader election (LE) configuration parameters:

Parameter Name Description Default value
enableLeaderElection Enable/disable leader election. false
leaderElectionLockName Name of the lock resource (ConfigMap) used for LE. spark-operator-lock
leaderElectionLockNamespace Namespace of the lock resource used for LE. spark-operator
leaderElectionLeaseDuration The duration that non-leader candidates will wait to force acquire leadership.
This is measured against the time of last observed ack.
15s
leaderElectionRenewDeadline The duration the acting master will retry refreshing leadership before giving up. 14s
leaderElectionRetryPeriod The duration the clients should wait between tries of action. 4s

To enable HA mode, the following parameters must be passed to kudo install command:

$ kubectl kudo install spark \
    --namespace=spark-operator \
    -p replicas=3 \
    -p enableLeaderElection=true

Verify the deployment and specified number of replicas are running:

$ kubectl get pods -n spark-operator                  
NAME                          	READY   STATUS  	RESTARTS   AGE
spark-instance-5c8b54d7fd-gq4xh   1/1     Running 	0      	15m
spark-instance-5c8b54d7fd-t7fbq   1/1     Running 	0      	15m
spark-instance-5c8b54d7fd-wcm66   1/1     Running 	0      	15m
spark-instance-init-6nwcx     	0/1     Completed   0      	15m
spark-pi-driver               	0/1     Completed   0      	90s

Information about the current leader instance stored in ConfigMap resource:

$ kubectl get configmaps spark-operator-lock -o yaml -n spark-operator
apiVersion: v1
kind: ConfigMap
metadata:
  annotations:
    control-plane.alpha.kubernetes.io/leader: 
'{"holderIdentity":"spark-instance-5c8b54d7fd-t7fbq","leaseDurationSec
onds":15,
        "acquireTime":"2019-12-16T13:13:26Z","renewTime":"2019-12-17T08:38:03Z
","leaderTransitions":0}'
  creationTimestamp: "2019-12-16T12:57:26Z"
  name: spark-operator-lock
  namespace: spark
  resourceVersion: "598116"
  selfLink: 
/api/v1/namespaces/spark-operator/configmaps/spark-operator-lock
  uid: 40d1b369-3384-42d0-b585-5565c4b08c97

Observe pods logs to check the leader election process is working and the leader has been elected:

$ kubectl logs spark-instance-5c8b54d7fd-t7fbq -n spark-operator
...
I1212 14:41:26.988989      10 main.go:206] Waiting to be elected 
leader before starting application controller goroutines
I1212 14:41:27.011785      10 leaderelection.go:214] successfully 
acquired lease spark-operator/spark-operator-lock
...

# Using Volcano as a Batch Scheduler

Volcano is a batch scheduling system built on Kubernetes. It provides a suite of mechanisms that are commonly required by many classes of batch & elastic workloads including distributed data processing and machine learning. Spark Operator can be integrated with Volcano to get fine-grained control of Spark applications scheduling via queues and priority classes. To get more information about the Volcano system and how to install it on your K8s cluster, visit https://volcano.sh/ (opens new window).

To enable the batch scheduler (disabled by default), install Spark Operator with the following parameter:

$ kubectl kudo install spark \
    --namespace=spark-operator \
    -p enableBatchScheduler=true

In SparkApplication yaml file, add the following parameter to the spec section:

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: <Spark application name>
  namespace: <Spark application namespace>
spec:
  batchScheduler: "volcano"
  <the rest of the configuration>

After the application is submitted, verify the driver pod is scheduled by Volcano:

$ kubectl describe spark-pi-driver -n spark-operator
...
Events:
  Type     Reason       Age   From               Message
  ----     ------       ----  ----               -------
  Normal   Scheduled    76s   volcano            Successfully assigned spark-operator/spark-pi-driver to <node-name>
...