Concept⚓︎

tl;dr

Semgr8s is a Kubernetes admission controller
Semgr8s directly integrates Semgrep under the hood
Rules are validated against admission requests that are similar to Kubernetes manifests
Admission requests might exhibit some important differences to Kubernetes manifests
Policy logic should be implemented via Semgrep rules NOT Semgr8s configuration

Basics⚓︎

Semgr8s is a policy controller for Kubernetes. By configuring rules, resources can be validated or even modified upon deployment to the cluster.

Technically, Semgr8s implements the Semgrep engine as a Kubernetes admission controller in order to audit Kubernetes resources based on Semgrep-syntax rules. Rules are either provided as local resources via configmaps or reference to a remote Semgrep registry.

Since admission requests resemble Kubernetes manifests and are converted by Semgr8s to compatible yaml format, custom rules can be developed based on knowledge about Kubernetes configuration and for example Semgrep's Kubernetes rules can be applied. It is, however, important to bear some differences in mind.

Operationally, user changes to Kubernetes are applied via the Kube API that in turn exhibits admission phases. During admission Kubernetes sends change requests to admission controllers via webhooks. Semgr8s receives these so-called admission requests, validates them against preconfigured rules and returns the outcome: Admit, modify, or deny. Accordingly, the Kube API then either persists the (modified) requests to etcd for application or stops deployment.

Architecture & Design⚓︎

Semgr8s is developed for installation via helm to setup the required Kubernetes resources. However, rendering of Kubernetes manifests for usage with kubectl apply is expected to work as well.

Configuration is maintained within values.yaml and kept at a minimum to maintain policy logic within rules (see philosophy).

values.yaml chart

charts/semgr8s/values.yaml

deployment:
  image:
    repository: ghcr.io/semgr8ns/semgr8s
    pullPolicy: IfNotPresent
    tag: ""
  imagePullSecrets: []
  replicas: 2
  containerPort: 5000
  podAnnotations: {}
  podSecurityContext: {}
  resources:
    limits:
      cpu: 1000m
      memory: 128Mi
    requests:
      cpu: 100m
      memory: 64Mi
  securityContext:
    allowPrivilegeEscalation: false
    capabilities:
      drop:
        - ALL
    privileged: false
    readOnlyRootFilesystem: true
    runAsNonRoot: true
    runAsUser: 10001 # remove when using openshift or OKD 4
    runAsGroup: 20001 # remove when using openshift or OKD 4
    seccompProfile:
      type: RuntimeDefault

service:
  type: ClusterIP
  port: 443

webhooks:  # configuration options for webhooks described under https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#webhook-configuration
  validating: # main webhook
    failurePolicy: Fail
    sideEffects: None
    timeoutSeconds: 30
    admissionReviewVersions: ["v1","v1beta1"]
    namespaceSelector:
      matchLabels:
        semgr8s/validation: enabled
    rules:
      - scope: "Namespaced"
        apiGroups: ["", "apps", "batch", "networking.k8s.io", "rbac.authorization.k8s.io"]
        resources: ["*/*"]
        apiVersions: ["*"]
        operations: ["CREATE", "UPDATE"]
  mutating: # autofix webhook, only used when enabled
    failurePolicy: Fail
    sideEffects: None
    timeoutSeconds: 30
    admissionReviewVersions: ["v1","v1beta1"]
    namespaceSelector:
      matchLabels:
        semgr8s/validation: enabled
    rules:
      - scope: "Namespaced"
        apiGroups: ["", "apps", "batch", "networking.k8s.io", "rbac.authorization.k8s.io"]
        resources: ["*/*"]
        apiVersions: ["*"]
        operations: ["CREATE", "UPDATE"]

application:
  # Configure the log level. Either one of `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`. Defaults to `INFO`
  logLevel: INFO
  # fail on rule violation (true/false)
  enforce: true
  # remoteRules: Apply remote rules from e.g.
  # * semgrep registry: https://semgrep.dev/r
  # * semgrep-rules github repo: https://github.com/semgrep/semgrep-rules
  # common choices: p/kubernetes, r/yaml.kubernetes
  remoteRules: ["p/kubernetes"]
  # apply semgrep fixes before validation (see https://semgrep.dev/docs/writing-rules/autofix)
  autofix: false
  # requires generic secret with name 'semgrep-app-token' and key 'token' in semgr8ns namespace
  semgrepLogin: false

With exception of the admission webhook, all Semgr8s resources reside in its namespace semgr8ns. Semgr8s exhibits a validating and an optional mutating admission webhook for use with the autofix feature.

webhook.yaml chart

charts/semgr8s/templates/webhook.yaml

{{- $svc := (include "semgr8s.serviceName" .) -}}
{{- $altNames := list -}}
{{- $altNames = append $altNames (printf "%s" $svc) -}}
{{- $altNames = append $altNames (printf "%s.%s" $svc .Release.Namespace) -}}
{{- $altNames = append $altNames (printf "%s.%s.svc" $svc .Release.Namespace) -}}
{{- $altNames = append $altNames (printf "%s.%s.svc.cluster.local" $svc .Release.Namespace) -}}
{{- $certificate := genSelfSignedCert (printf "%s.%s.svc" $svc .Release.Namespace) nil $altNames 36500 -}}

apiVersion: v1
kind: Secret
metadata:
  name: {{ include "semgr8s.TLSName" . }}
  labels:
    {{- include "semgr8s.labels" . | nindent 4 }}
type: Opaque
data:
  tls.crt: {{ $certificate.Cert | b64enc }}
  tls.key: {{ $certificate.Key | b64enc }}
---
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: {{ include "semgr8s.webhookName" . }}
webhooks:
  - name: {{ .Chart.Name }}-svc.{{ .Release.Namespace }}.svc
    {{- with .Values.webhooks.validating }}
      {{- toYaml . | nindent 4 }}
    {{- end }}
    clientConfig:
      service:
        name: {{ include "semgr8s.serviceName" . }}
        namespace: {{ .Release.Namespace }}
        path: /validate/
      caBundle: {{ $certificate.Cert | b64enc }}
---
{{- if .Values.application.autofix }}
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
  name: {{ include "semgr8s.webhookName" . }}
webhooks:
  - name: {{ .Chart.Name }}-svc.{{ .Release.Namespace }}.svc
    {{- with .Values.webhooks.mutating }}
      {{- toYaml . | nindent 4 }}
    {{- end }}
    clientConfig:
      service:
        name: {{ include "semgr8s.serviceName" . }}
        namespace: {{ .Release.Namespace }}
        path: /mutate/
      caBundle: {{ $certificate.Cert | b64enc }}
{{ end }}

Their default configuration includes all CREATE and UPDATE requests for all apiGroups, resources, and apiVersions for namespaces with label semgr8s/validation=enabled. However, Event resources are manually dropped in application logic to suppress unnecessary load. The corresponding /validate/ and /mutate/ webhooks are exposed via HTTPS as a service that also handles load balancing.

service.yaml chart

charts/semgr8s/templates/service.yaml

apiVersion: v1
kind: Service
metadata:
  name: {{ include "semgr8s.serviceName" . }}
  labels:
    {{- include "semgr8s.labels" . | nindent 4 }}
spec:
  type: {{ .Values.service.type }}
  ports:
    - port: {{ .Values.service.port }}
      targetPort: {{ .Values.deployment.containerPort }}
  selector:
    {{- include "semgr8s.selectorLabels" . | nindent 4 }}

The application logic is written in Python, exposed via cheroot webserver for performance, using flask framework for simplicity and maintainability, packaged in a minimal container image based on Alpine and deployed as securely configured single Pod with configurable number of replicas (default: 2) for scalability and availability. The core functionality of rule validation against admission requests is implemented by directly integrating Semgrep.

The Semgr8s application logic performs the following core functions:

validate admission requests
mutate admission requests
update local rules

Semgrep is designed to scan files and consequently Semgr8s application logic manages rules, request and results data as files. As the container file system is configured as readOnlyRootFilesystem, corresponding volumes (/app/rules/, /app/data) and additional Semgrep folders (/.semgrep/, /.cache, /tmp) are provided via volume mounts. Performance-critical, small-size, ephemeral folders are mounted as tmpfs in order to avoid race conditions and timeouts at the expense of additional memory. The TLS certificate for HTTPS is provided as secret volume.

For mutation and validation, an incoming admission request is input validated, the admission object is converted to yaml and written to file. Semgrep is invoked on the admission request file using rules stored under /app/rules/. Additional configuration is passed as system arguments. Semgrep writes scan results to a results file that is parsed and rendered for admission response. After completion request and result file are deleted to maintain a constant storage size.

Semgreps periodically runs an update job that gets the rule configmaps, decodes them and writes them to the file system under /app/rules/. The update job runs once every minute. Thus, adding new rules, modifying existing ones, or removing them can take up to 1min to propagate.

While the local rules provided as configmaps are updated manually, Semgr8s configuration configmaps (including remote rules) are mounted as environment variables upon container creation and require a restart for updating.

Semgr8s uses a service account with list/get configmaps permission in its own namespace to get updated rule configmaps:

role.yaml

charts/semgr8s/templates/role.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: {{ include "semgr8s.roleName" . }}
  labels:
    {{- include "semgr8s.labels" . | nindent 4 }}
rules:
- apiGroups: ["*"]
  resources: ["configmaps"]
  verbs: ["list", "get"]

Admission requests⚓︎

It is important to note that an admission request is in essence similar to Kubernetes manifests, but not the same and those differences might matter when writing rules. Consider a simple pod resource:

apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  containers:
    - name: mycontainer
      image: nginx

Semgr8s extracts the object from the admission request and ignores additional information. The (reduced) admission request takes the following form:

Admission request

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"mypod","namespace":"test-semgr8s"},"spec":{"containers":[{"image":"nginx","name":"mycontainer"}]}}'
  creationTimestamp: '2024-05-10T13:47:01Z'
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
      f:spec:
        f:containers:
          k:{"name":"mycontainer"}:
            .: {}
            f:image: {}
            f:imagePullPolicy: {}
            f:name: {}
            f:resources: {}
            f:terminationMessagePath: {}
            f:terminationMessagePolicy: {}
        f:dnsPolicy: {}
        f:enableServiceLinks: {}
        f:restartPolicy: {}
        f:schedulerName: {}
        f:securityContext: {}
        f:terminationGracePeriodSeconds: {}
    manager: kubectl-client-side-apply
    operation: Update
    time: '2024-05-10T13:47:01Z'
  name: mypod
  namespace: test-semgr8s
  uid: 2d28a432-6526-4022-96fa-cb9d0ff50756
spec:
  containers:
  - image: nginx
    imagePullPolicy: Always
    name: mycontainer
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-xrkfj
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: kube-api-access-xrkfj
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  phase: Pending
  qosClass: BestEffort

While the original resource configuration is maintained, considerable additional information is added by the Kube API. Besides metadata and status information, most specification data explicitly declares implicit defaults or cluster specifics. With Semgr8s both, user-supplied Kubernetes manifest data and additionally added information, can be validated and mutated.

It is certainly instructive to consider Kubernetes manifests and configuration knowledge during rule development, but it is imperative to bear in mind that rules validate admission requests in the end. Consider for example a rule that checks whether a securityContext is explicitly set and otherwise adds a secure configuration (see e.g. run-as-non-root). Above, we observe that the Kube API adds an explicit empty securityContext when none is provided and as a result the above rule offers no benefit.

It is possible to render such an admission object for a given target resource resource.yaml via the --dry-run=server flag:

kubectl apply -f resource.yaml --dry-run=server -o yaml

Note

--dry-run=server submits the server-side request without persisting the resource and consequently also passes admission phases of your cluster. A running Semgr8s instance might therefore block successful execution of the above request.

Philosophy⚓︎

Implement policy logic via Semgrep rules to leverage its extensive rule syntax and maintain a single source of truth. Keep Semgr8s configuration at a minimum providing only basic global settings.

Some configuration options for Semgr8s are available via charts/semgr8s/values.yaml. While these might allow to implement certain policies such as resource type restriction via admission webhook scoping, it is encouraged to keep these at the minimum and maintain the policy logic within rules in order to avoid unexpected conflicts.

Similarly, Semgr8s is namespaced by restriction to namespaces with label semgr8s/validation=enabled which is intended as a fail safe (e.g. exclude kube-system or semgr8ns) not as a part of policy. Restricting a certain rule to a specific namespace is entirely possible within the Semgrep rule syntax.

In essence, the configuration via charts/semgr8s/values.yaml should be aimed at the general scope of the policy controller and not towards policy itself. This philosophy should be kept in mind during development and usage of Semgr8s.