# Security
# Securing Spark RPC communication
Spark supports authentication for RPC channels (protocol between Spark processes), which allows secure communication between driver and executors and also adds a possibility to enable network encryption between Spark processes. For more information refer to the official Spark documentation (opens new window).
# Authentication and encryption
In order to enable RPC authentication:
- set
spark.authenticate
configuration property totrue
- mount
SPARK_AUTHENTICATE_SECRET
environment variable from a secret for both the Driver and Executors
To enable encryption for RPC connections, set spark.network.crypto.enabled
configuration property to true
.
Spark authentication must be enabled for encryption to work.
Additional configuration properties can be found in Spark documentation.
The example below describes how to set up authentication and encryption for SparkApplication
on Kubernetes.
- Create a authentication secret, which will be securely mounted to a driver and executor pods.
$ kubectl create secret generic spark-secret --from-literal secret=my-secret
- Set log level to
DEBUG
as described in Configuring Logging section of the documentation. - Apply the following
SparkApplication
specification withkubectl apply -f <application_spec.yaml>
command.
Note: If you are using the block transfer service, you might want to enable "spark.authenticate.enableSaslEncryption" property to enable SASL encryption for Spark RPCs.
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-rpc-auth-enctryption-app
namespace: <namespace>
spec:
type: Scala
mode: cluster
image: "mesosphere/spark:spark-3.0.0-hadoop-2.9-k8s"
imagePullPolicy: Always
mainClass: MockTaskRunner
mainApplicationFile: "https://kudo-spark.s3-us-west-2.amazonaws.com/spark-scala-tests-3.0.0-20200819.jar"
arguments:
- "1"
- "600"
sparkConf:
"spark.scheduler.maxRegisteredResourcesWaitingTime": "2400s"
"spark.scheduler.minRegisteredResourcesRatio": "1.0"
"spark.authenticate": "true"
"spark.network.crypto.enabled": "true"
"spark.kubernetes.driver.secretKeyRef.SPARK_AUTHENTICATE_SECRET": "spark-secret:secret"
"spark.kubernetes.executor.secretKeyRef.SPARK_AUTHENTICATE_SECRET": "spark-secret:secret"
sparkVersion: 3.0.0
sparkConfigMap: spark-conf-map
restartPolicy:
type: Never
driver:
cores: 1
memory: "512m"
labels:
version: 3.0.0
serviceAccount: <service-account>
executor:
cores: 1
instances: 1
memory: "512m"
labels:
version: 3.0.0
javaOptions: "-Dlog4j.configuration=file:/etc/spark/conf/log4j.properties"
This will create a test job, which emulates a long-running task.
Authentication secrets will be injected into Spark pods via SPARK_AUTHENTICATE_SECRET
environment variable.
- Check logs of the running pods:
$ kubectl logs spark-rpc-auth-enctryption-app-driver -f
- The logs should contain auth-related messages similar to the ones in snippets below:
(driver logs):
...
20/02/07 16:13:09 DEBUG TransportServer: New connection accepted for remote address /10.244.0.104:46512.
20/02/07 16:13:09 DEBUG AuthRpcHandler: Received new auth challenge for client /10.244.0.104:46512.
20/02/07 16:13:09 DEBUG AuthRpcHandler: Authenticating challenge for app sparkSaslUser.
20/02/07 16:13:10 DEBUG AuthEngine: Generated key with 1024 iterations in 27735 us.
20/02/07 16:13:10 DEBUG AuthEngine: Generated key with 1024 iterations in 8148 us.
20/02/07 16:13:10 DEBUG AuthRpcHandler: Authorization successful for client /10.244.0.104:46512.
...
(executor logs):
...
20/02/07 16:13:07 DEBUG TransportClientFactory: Creating new connection to spark-rpc-auth-enctryption-app-1581091973687-driver-svc.default.svc/10.244.0.103:7078
20/02/07 16:13:07 DEBUG TransportClientFactory: Connection to spark-rpc-auth-enctryption-app-1581091973687-driver-svc.default.svc/10.244.0.103:7078 successful, running bootstraps...
20/02/07 16:13:07 DEBUG AuthEngine: Generated key with 1024 iterations in 4715 us.
20/02/07 16:13:08 INFO TransportClientFactory: Successfully created connection to spark-rpc-auth-enctryption-app-1581091973687-driver-svc.default.svc/10.244.0.103:7078 after 115 ms (110 ms spent in bootstraps)
20/02/07 16:13:08 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@spark-rpc-auth-enctryption-app-1581091973687-driver-svc.default.svc:7078
20/02/07 16:13:08 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
...
- Verify the network encryption:
- Attach to the driver pod:
$ kubectl exec -it spark-rpc-auth-enctryption-app-driver bash
- Use
ngrep
tool to monitor the traffic on7078
port:
$ ngrep port 7078
interface: eth0 (10.244.0.0/255.255.255.0)
filter: ( port 7078 ) and ((ip || ip6) || (vlan && (ip || ip6)))
#
T 10.244.0.104:46480 -> 10.244.0.103:7078 [AP] #1
uVx....."....Qn.0....z|.4-..N.+.(E.yU..;.....5.V..e..t_L.....T....9....Zcj/A..;...7.S.....X.7.7+..Nd..x...1.Qg0D.d...vV...P V....7....\._lgi...*.#]..i..8.\...+Tu/.H.Wx*..=o.....I.K.,.....g...@:...8.;...Q...
.$.D..&...P.@R...I.s.M..`.oAn'.I...g.p..=....$............b.....O.|..v..:X..!H.Fot.....r83.....-Y..,X..W.......PC.....
# TLS configuration
Spark allows to configure TLS for Spark web endpoints, such as Spark UI and Spark History Server UI. To get more information about SSL configuration in Spark, refer to the Spark documentation (opens new window).
Here are the steps required to configure TLS for SparkApplication
:
Note: TLS setup requires a Keystore (a database of cryptographic keys, X.509 certificate chains, and trusted certificates) to be provided. To generate a Keystore, follow the official documentation (opens new window) for keytool
which is a key and certificate management utility shipped with JDK.
- Create a
Secret
containing all the sensitive data (passwords and key-stores):
$ kubectl create secret generic ssl-secrets \
--from-file keystore.jks \
--from-file truststore.jks \
--from-literal key-password=<password for the private key> \
--from-literal keystore-password=<password for the keystore> \
--from-literal truststore-password=<password for the truststore>
- In
SparkApplication
, specifyspark.ssl.*
configuration properties viasparkConf
and mount the secret created in the previous step usingspark.kubernetes.*
properties.keystore.jks
andtruststore.jks
will be mounted to/tmp/spark/ssl
directory and the passwords will be passed to the driver pod via predefined environment variables.
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: <app-name>
namespace: <namespace>
spec:
...
image: "mesosphere/spark:spark-3.0.0-hadoop-2.9-k8s"
sparkConf:
"spark.ssl.enabled": "true",
"spark.ssl.keyStore": "/tmp/spark/ssl/keystore.jks",
"spark.ssl.protocol": "TLSv1.2",
"spark.ssl.trustStore": "/tmp/spark/ssl/truststore.jks",
"spark.kubernetes.driver.secretKeyRef.SPARK_SSL_KEYPASSWORD": "ssl-secrets:key-password",
"spark.kubernetes.driver.secretKeyRef.SPARK_SSL_KEYSTOREPASSWORD": "ssl-secrets:keystore-password",
"spark.kubernetes.driver.secretKeyRef.SPARK_SSL_TRUSTSTOREPASSWORD": "ssl-secrets:truststore-password"
"spark.kubernetes.driver.secrets.ssl-secrets": "/tmp/spark/ssl"
...
- Forward a local port to a Spark UI (driver) port (default port for SSL connections is 4440):
$ kubectl port-forward <driver-pod-name> 4440
- Spark UI should now be available via https://localhost:4440 (opens new window).