Concept⚓︎
tl;dr
- Semgr8s is a Kubernetes admission controller
- Semgr8s directly integrates Semgrep under the hood
- Rules are validated against admission requests that are similar to Kubernetes manifests
- Admission requests might exhibit some important differences to Kubernetes manifests
- Policy logic should be implemented via Semgrep rules NOT Semgr8s configuration
Basics⚓︎
Semgr8s is a policy controller for Kubernetes. By configuring rules, resources can be validated or even modified upon deployment to the cluster.
Technically, Semgr8s implements the Semgrep engine as a Kubernetes admission controller in order to audit Kubernetes resources based on Semgrep-syntax rules. Rules are either provided as local resources via configmaps or reference to a remote Semgrep registry.
Since admission requests resemble Kubernetes manifests and are converted by Semgr8s to compatible yaml format, custom rules can be developed based on knowledge about Kubernetes configuration and for example Semgrep's Kubernetes rules can be applied. It is, however, important to bear some differences in mind.
Operationally, user changes to Kubernetes are applied via the Kube API that in turn exhibits admission phases. During admission Kubernetes sends change requests to admission controllers via webhooks. Semgr8s receives these so-called admission requests, validates them against preconfigured rules and returns the outcome: Admit, modify, or deny. Accordingly, the Kube API then either persists the (modified) requests to etcd for application or stops deployment.
Architecture & Design⚓︎
Semgr8s is developed for installation via helm to setup the required Kubernetes resources.
However, rendering of Kubernetes manifests for usage with kubectl apply
is expected to work as well.
Configuration is maintained within values.yaml
and kept at a minimum to maintain policy logic within rules (see philosophy).
values.yaml
chart
deployment:
image:
repository: ghcr.io/semgr8ns/semgr8s
pullPolicy: IfNotPresent
tag: ""
imagePullSecrets: []
replicas: 2
containerPort: 5000
podAnnotations: {}
podSecurityContext: {}
resources:
limits:
cpu: 1000m
memory: 128Mi
requests:
cpu: 100m
memory: 64Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 10001 # remove when using openshift or OKD 4
runAsGroup: 20001 # remove when using openshift or OKD 4
seccompProfile:
type: RuntimeDefault
service:
type: ClusterIP
port: 443
webhooks: # configuration options for webhooks described under https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#webhook-configuration
validating: # main webhook
failurePolicy: Fail
sideEffects: None
timeoutSeconds: 30
admissionReviewVersions: ["v1","v1beta1"]
namespaceSelector:
matchLabels:
semgr8s/validation: enabled
rules:
- scope: "Namespaced"
apiGroups: ["", "apps", "batch", "networking.k8s.io", "rbac.authorization.k8s.io"]
resources: ["*/*"]
apiVersions: ["*"]
operations: ["CREATE", "UPDATE"]
mutating: # autofix webhook, only used when enabled
failurePolicy: Fail
sideEffects: None
timeoutSeconds: 30
admissionReviewVersions: ["v1","v1beta1"]
namespaceSelector:
matchLabels:
semgr8s/validation: enabled
rules:
- scope: "Namespaced"
apiGroups: ["", "apps", "batch", "networking.k8s.io", "rbac.authorization.k8s.io"]
resources: ["*/*"]
apiVersions: ["*"]
operations: ["CREATE", "UPDATE"]
application:
# Configure the log level. Either one of `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`. Defaults to `INFO`
logLevel: INFO
# fail on rule violation (true/false)
enforce: true
# remoteRules: Apply remote rules from e.g.
# * semgrep registry: https://semgrep.dev/r
# * semgrep-rules github repo: https://github.com/semgrep/semgrep-rules
# common choices: p/kubernetes, r/yaml.kubernetes
remoteRules: ["p/kubernetes"]
# apply semgrep fixes before validation (see https://semgrep.dev/docs/writing-rules/autofix)
autofix: false
# requires generic secret with name 'semgrep-app-token' and key 'token' in semgr8ns namespace
semgrepLogin: false
With exception of the admission webhook, all Semgr8s resources reside in its namespace semgr8ns
.
Semgr8s exhibits a validating and an optional mutating admission webhook for use with the autofix feature.
webhook.yaml
chart
{{- $svc := (include "semgr8s.serviceName" .) -}}
{{- $altNames := list -}}
{{- $altNames = append $altNames (printf "%s" $svc) -}}
{{- $altNames = append $altNames (printf "%s.%s" $svc .Release.Namespace) -}}
{{- $altNames = append $altNames (printf "%s.%s.svc" $svc .Release.Namespace) -}}
{{- $altNames = append $altNames (printf "%s.%s.svc.cluster.local" $svc .Release.Namespace) -}}
{{- $certificate := genSelfSignedCert (printf "%s.%s.svc" $svc .Release.Namespace) nil $altNames 36500 -}}
apiVersion: v1
kind: Secret
metadata:
name: {{ include "semgr8s.TLSName" . }}
labels:
{{- include "semgr8s.labels" . | nindent 4 }}
type: Opaque
data:
tls.crt: {{ $certificate.Cert | b64enc }}
tls.key: {{ $certificate.Key | b64enc }}
---
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
name: {{ include "semgr8s.webhookName" . }}
webhooks:
- name: {{ .Chart.Name }}-svc.{{ .Release.Namespace }}.svc
{{- with .Values.webhooks.validating }}
{{- toYaml . | nindent 4 }}
{{- end }}
clientConfig:
service:
name: {{ include "semgr8s.serviceName" . }}
namespace: {{ .Release.Namespace }}
path: /validate/
caBundle: {{ $certificate.Cert | b64enc }}
---
{{- if .Values.application.autofix }}
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
name: {{ include "semgr8s.webhookName" . }}
webhooks:
- name: {{ .Chart.Name }}-svc.{{ .Release.Namespace }}.svc
{{- with .Values.webhooks.mutating }}
{{- toYaml . | nindent 4 }}
{{- end }}
clientConfig:
service:
name: {{ include "semgr8s.serviceName" . }}
namespace: {{ .Release.Namespace }}
path: /mutate/
caBundle: {{ $certificate.Cert | b64enc }}
{{ end }}
Their default configuration includes all CREATE
and UPDATE
requests for all apiGroups, resources, and apiVersions for namespaces with label semgr8s/validation=enabled
. However, Event
resources are manually dropped in application logic to suppress unnecessary load.
The corresponding /validate/
and /mutate/
webhooks are exposed via HTTPS as a service that also handles load balancing.
service.yaml
chart
apiVersion: v1
kind: Service
metadata:
name: {{ include "semgr8s.serviceName" . }}
labels:
{{- include "semgr8s.labels" . | nindent 4 }}
spec:
type: {{ .Values.service.type }}
ports:
- port: {{ .Values.service.port }}
targetPort: {{ .Values.deployment.containerPort }}
selector:
{{- include "semgr8s.selectorLabels" . | nindent 4 }}
The application logic is written in Python, exposed via cheroot webserver for performance, using flask framework for simplicity and maintainability, packaged in a minimal container image based on Alpine and deployed as securely configured single Pod with configurable number of replicas (default: 2) for scalability and availability. The core functionality of rule validation against admission requests is implemented by directly integrating Semgrep.
The Semgr8s application logic performs the following core functions:
- validate admission requests
- mutate admission requests
- update local rules
Semgrep is designed to scan files and consequently Semgr8s application logic manages rules, request and results data as files.
As the container file system is configured as readOnlyRootFilesystem
, corresponding volumes (/app/rules/
, /app/data
) and additional Semgrep folders (/.semgrep/
, /.cache
, /tmp
) are provided via volume mounts.
Performance-critical, small-size, ephemeral folders are mounted as tmpfs in order to avoid race conditions and timeouts at the expense of additional memory.
The TLS certificate for HTTPS is provided as secret volume.
For mutation and validation, an incoming admission request is input validated, the admission object is converted to yaml and written to file.
Semgrep is invoked on the admission request file using rules stored under /app/rules/
.
Additional configuration is passed as system arguments.
Semgrep writes scan results to a results file that is parsed and rendered for admission response.
After completion request and result file are deleted to maintain a constant storage size.
Semgreps periodically runs an update job that gets the rule configmaps, decodes them and writes them to the file system under /app/rules/
.
The update job runs once every minute.
Thus, adding new rules, modifying existing ones, or removing them can take up to 1min to propagate.
While the local rules provided as configmaps are updated manually, Semgr8s configuration configmaps (including remote rules) are mounted as environment variables upon container creation and require a restart for updating.
Semgr8s uses a service account with list
/get
configmaps permission in its own namespace to get updated rule configmaps:
role.yaml
Admission requests⚓︎
It is important to note that an admission request is in essence similar to Kubernetes manifests, but not the same and those differences might matter when writing rules. Consider a simple pod resource:
Semgr8s extracts the object
from the admission request and ignores additional information.
The (reduced) admission request takes the following form:
Admission request
apiVersion: v1
kind: Pod
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"mypod","namespace":"test-semgr8s"},"spec":{"containers":[{"image":"nginx","name":"mycontainer"}]}}'
creationTimestamp: '2024-05-10T13:47:01Z'
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
f:spec:
f:containers:
k:{"name":"mycontainer"}:
.: {}
f:image: {}
f:imagePullPolicy: {}
f:name: {}
f:resources: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:dnsPolicy: {}
f:enableServiceLinks: {}
f:restartPolicy: {}
f:schedulerName: {}
f:securityContext: {}
f:terminationGracePeriodSeconds: {}
manager: kubectl-client-side-apply
operation: Update
time: '2024-05-10T13:47:01Z'
name: mypod
namespace: test-semgr8s
uid: 2d28a432-6526-4022-96fa-cb9d0ff50756
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: mycontainer
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-xrkfj
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: kube-api-access-xrkfj
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
phase: Pending
qosClass: BestEffort
While the original resource configuration is maintained, considerable additional information is added by the Kube API. Besides metadata and status information, most specification data explicitly declares implicit defaults or cluster specifics. With Semgr8s both, user-supplied Kubernetes manifest data and additionally added information, can be validated and mutated.
It is certainly instructive to consider Kubernetes manifests and configuration knowledge during rule development, but it is imperative to bear in mind that rules validate admission requests in the end.
Consider for example a rule that checks whether a securityContext
is explicitly set and otherwise adds a secure configuration (see e.g. run-as-non-root
).
Above, we observe that the Kube API adds an explicit empty securityContext
when none is provided and as a result the above rule offers no benefit.
It is possible to render such an admission object for a given target resource resource.yaml
via the --dry-run=server
flag:
Note
--dry-run=server
submits the server-side request without persisting the resource and consequently also passes admission phases of your cluster.
A running Semgr8s instance might therefore block successful execution of the above request.
Philosophy⚓︎
Implement policy logic via Semgrep rules to leverage its extensive rule syntax and maintain a single source of truth. Keep Semgr8s configuration at a minimum providing only basic global settings.
Some configuration options for Semgr8s are available via charts/semgr8s/values.yaml
.
While these might allow to implement certain policies such as resource type restriction via admission webhook scoping, it is encouraged to keep these at the minimum and maintain the policy logic within rules in order to avoid unexpected conflicts.
Similarly, Semgr8s is namespaced by restriction to namespaces with label semgr8s/validation=enabled
which is intended as a fail safe (e.g. exclude kube-system
or semgr8ns
) not as a part of policy.
Restricting a certain rule to a specific namespace is entirely possible within the Semgrep rule syntax.
In essence, the configuration via charts/semgr8s/values.yaml
should be aimed at the general scope of the policy controller and not towards policy itself.
This philosophy should be kept in mind during development and usage of Semgr8s.