...
Test that the CNF does not crash when disk fill occurs)
Issues raised to CNCF CNF Testsuite during this work
The analyzis
Test | Note | Verdict |
---|---|---|
To test the increasing and decreasing of capacity | Do we request horizontal scaling from all CNF-s? Most (data plane, signalling, etc) but not all (eg OSS) | should be optional, or just fail if it scales incorrectly in case the CNF scales |
Test if the Helm chart is published | At the moment RA2 does not mandate the usage of Helm. We should first decide on CNF packaging. RA2 can stay neutral, follow the O-RAN/ONAP ASD path or propose own solution. | should be fine - no HELM specs in RA2 today, unless some incompatible CNFs packaging specs (unlikely) |
Test if the Helm chart is valid | At the moment RA2 does not mandate the usage of Helm. | |
Test if the Helm deploys | At the moment RA2 does not mandate the usage of Helm. This should be more generic, like testing if the CNF deploys. | |
Test if the install script uses Helm v3 | At the moment RA2 does not mandate the usage of Helm. | |
To test if the CNF can perform a rolling update | As there's some CNFs that actually use rolling update without keeping the service alive (because they require some post-configuration), the test should make sure that there is service continuity. this might just be a health probe or testing the k8s service, or something sufficiently straightforward. In other words, CNF service/traffic should work during the whole process (before during and after a rolling upgrade) | Needed |
To check if a CNF version can be downgraded through a rolling_version_change | It is not clear what is the difference between a rolling upgrade downgrade and a rolling version change. Maybe when you request an arbitrary version? | |
To check if a CNF version can be downgraded through a rolling_downgrade | Same as above? | Needed |
To check if a CNF version can be rolled back rollback | It is not clear what is the difference between a rolling downgrade and a rolled back rollback. | |
To check if the CNF is compatible with different CNIs | This covers only the default CNI, does not cover the metaplugin part. Need additional tests for cases with multiple interfaces. | Ok but needs additional tests for multiple interfaces |
(PoC) To check if a CNF uses Kubernetes alpha APIs | Alpha API-s are not recommended by ra2.k8s.012 . It is not clear what is the OK criteria of this test. | Ok if fails with alpha |
To check if the CNF has a reasonable image size | It passes if the image size is smaller than 5GB. | Ok but should be documented or configurable? |
To check if the CNF have a reasonable startup time | It is not clear what reasonable startup time is | Ok but should be documented or configurable? |
To check if the CNF has multiple process types within one container | Containers in the CNF should have only one process type. | What's the rationale? |
To check if the CNF exposes any of its containers as a service | Service type what? RA2 mandates that clusters must support Loadbalancer and ClusterIP, and should support Nodeport and ExternalName Should there be a test for the CNF to use Ingress or Gateway objects as well ? | May need tweaking to add Ingress? |
To check if the CNF has multiple microservices that share a database | Clarify rationale? In some cases it is good for multiple Microservices to share a DB, eg when restoring the state of a transaction from a failed service. Also good to have a shared DB across multiple services for things like HSS etc. | Clarify |
Test if the CNF crashes when node drain and rescheduling occurs. All configuration should be stateless | CNF should react gracefully (no loss of context/sessions/data/logs & service continues to run) to eviction and node draining The statelessness test should be made independent & Should be skipped for stateful pods eg Dns | Needed - but replace "crash" with "react gracefully" (no loss of context/sessions/data/logs & service continues to run) Statelessness test should be separate |
To test if the CNF uses a volume host path | should pass if the cnf doesn't have a hostPath volume What's the rationale? | |
To test if the CNF uses local storage | should fail if local storage configuration found What's the rationale? | |
To test if the CNF uses elastic volumes | should pass if the cnf uses an elastic volume What's an elastic volume? Does this mean Ephemeral? Or is this an AWS-specific test? There should be a definition of what an elastic volume is (besides ELASTIC_PROVISIONING_DRIVERS_REGEX) | What's an elastic volume? Does this mean Ephemeral? Or is this an AWS-specific test? |
To test if the CNF uses a database with either statefulsets, elastic volumes, or both | A database may use statefulsets along with elastic volumes to achieve a high level of resiliency. Any database in K8s should at least use elastic volumes to achieve a minimum level of resilience regardless of whether a statefulset is used. Statefulsets without elastic volumes is not recommended, especially if it explicitly uses local storage. The least optimal storage configuration for a database managed by K8s is local storage and no statefulsets, as this is not tolerant to node failure. There should be a definition of what an elastic volume is (besides ELASTIC_PROVISIONING_DRIVERS_REGEX) | What's an elastic volume? Does this mean Ephemeral? Or is this an AWS-specific test? |
Test if the CNF crashes when network latency occurs | How is this tested? Where is the test running? Some traffic against a service? Latency should be configurable (default is 2s)? What should happen if latency is exceeded? Should this be more stringent than "not crashing?" What is the expectation? | Needed but needs clarification |
Test if the CNF crashes when disk fill occurs | Needed | |
Test if the CNF crashes when pod delete occurs | Needed | |
Test if the CNF crashes when pod memory hog occurs | ||
Test if the CNF crashes when pod io stress occurs | Needed | |
Test if the CNF crashes when pod network corruption occurs | It is not clear what network corruption is in this context. | |
Test if the CNF crashes when pod network duplication occurs | It is not clear what network duplication is in this context. | |
To test if there is a liveness entry in the Helm chart | Liveness probe should be mandatory, but RA2 does not mandate Helm at the moment. | |
To test if there is a readiness entry in the Helm chart | Readiness probe should be mandatory, but RA2 does not mandate Helm at the moment. | |
To check if logs are being sent to stdout/stderr | ||
To check if prometheus is installed and configured for the cnf | There is a chapter for Additional required components (4.10), but without any content. | |
To check if logs and data are being routed through fluentd | There is a chapter for Additional required components (4.10), but without any content. | |
To check if Open Metrics is being used and or compatible. | There is a chapter for Additional required components (4.10), but without any content. | |
To check if tracing is being used with Jaeger | There is a chapter for Additional required components (4.10), but without any content. | |
To check if a CNF is using container socket mounts | Make sure to not mount /var/run/docker.sock, /var/run/containerd.sock or /var/run/crio.sock on the containers | Needed |
To check if containers are using any tiller images | ||
To check if any containers are running in privileged mode | ||
To check if a CNF is running services with external IP's | ||
To check if any containers are running as a root user | ||
To check if any containers allow for privilege escalation | ||
To check if an attacker can use a symlink for arbitrary host file system access | ||
To check if there are service accounts that are automatically mapped | ||
To check if there is a host network attached to a pod | ||
To check if there are service accounts that are automatically mapped | ||
To check if there is an ingress and egress policy defined | ||
To check if there are any privileged containers | ||
To check for insecure capabilities | ||
To check for dangerous capabilities | ||
To check if namespaces have network policies defined | ||
To check if containers are running with non-root user with non-root membership | ||
To check if containers are running with hostPID or hostIPC privileges | ||
To check if security services are being used to harden containers | ||
To check if containers have resource limits defined | ||
To check if containers have immutable file systems | ||
To check if containers have hostPath mounts | ||
To check if containers are using labels | ||
To test if there are versioned tags on all images using OPA Gatekeeper | ||
To test if there are any (non-declarative) hardcoded IP addresses or subnet masks | ||
To test if there are node ports used in the service configuration | ||
To test if there are host ports used in the service configuration | ||
To test if there are any (non-declarative) hardcoded IP addresses or subnet masks in the K8s runtime configuration | ||
To check if a CNF version uses immutable configmaps | ||
Test if the CNF crashes when pod dns error occurs | ||
...