Serverless Architecture Conference Blog

Welcome to Knative!

Serverless workloads on Kubernetes with Knative

Apr 8, 2019

Serverless is a buzzword that everybody seems to be talking about nowadays. You may like the term or not, but the important thing here is what it describes. In a nutshell, Serverless means the application's scale is constantly adapted to ensure that you always have the exact amount of resources you currently need available. In case of doubt, this may even mean: none at all! For you as a user, this means that you always pay only for the capacity you need to provide a response to the queries to your application. If there are no users or requests, you pay nothing at all.

Yet another subject matter which cannot be ignored anymore is the is containers. Along with the related area of container orchestration. It deals with the efficient distribution and management of containers within a cluster of machines. You can mention Kubernetes basically in the same breath, since it is the de-facto standard [1] for the orchestration of containers.

What would happen then if you were to put together the possibilities of both worlds and had a platform which would combine the properties and features of Serverless and the sophisticated container and application management of Kubernetes? There are a few solutions which try to do this: On one side, you have the so-called Function-as-a-Service (FaaS) frameworks such as OpenWhisk, Kubeless and OpenFaaS, on the other side there are Platform-as-a-Service-(PaaS) frameworks such as CloudFoundry.

FaaS frameworks give users the possibility to literally deploy a function in the programming sense of the word. The function is then executed as a response to an event. PaaS frameworks on the other hand are more focused on long-running processes which are available via an HTTP interface. Both approaches usually have a mechanism which creates a deployable unit from a piece of source code. Ultimately, all these frameworks therefore concentrate on a specific type of workload and can  greatly differ from each other in the manner they are used.

What would you get if you had a platform which would combine long-running applications as well as very short-lived functions and generally applications of any size into a common topology and terminology set?

Welcome to Knative!

Knative is an open source project initiated by Google and supported by a few other giants in the industry. These include Pivotal, IBM, Red Hat and SAP, just to name a few. The participants refer to Knative as a set of Kubernetes-based middleware components which allow you to build modern applications which are container-based and source-code-oriented. The idea was inspired by other Kubernetes-based frameworks in the same environment (see above). It combines all the best practices into a single framework and fulfils the following requirements:


  1. It is Kubernetes-native (thus the name Knative) and works like an extension of Kubernetes.
  2. It covers any possible type of Serverless workload.
  3. It aggregates all these workloads under a common topology and terminology set.


Beyond the Serverless features, it also complies with a number of additional standards required of a modern development platform:


  1. It is easy to use for developers (see example)
  2. It supports a variety of build methods to create a deployable unit (in this case a container image) from source code.
  3. It supports modern deployment methods (such as automated deployment after commits).

Want more great articles about Serverless Architecture? Subscribe to our newsletter!

How does Knative work?

To go into more detail on how Knative works and how it is structured, we must first determine which part of Knative we are actually talking about. The Knative organisation is made up of several repositories. In terms of content, relevant to the user are three main components: Serving, Build and Eventing.

  • Serving: The Serving project is responsible for any features revolving around deployment as well as the scaling of applications to be deployed. This also includes the setup of suitable network topology to provide access to an application under a given hostname. There is certainly a significant overlap in terms of content between this part and the description of Serverless stated above.
  • Build: As the name itself suggests, the Build project is responsible for “building” a container image from the program code. This container image can then, for example, be taken from Serving and be An interesting feature is that Build and Serving run in the same cluster. The built image therefore does need to be transported through an external registry, but is ready and available on the spot.
  • Eventing: The Eventing project covers the event-driven nature of serverless applications. It gives you the possibility to establish a buffered connection between an event source and a consumer. This type of consumer can for example be a service managed by Serving.

In this article, we will focus on the Serving project in particular, since it is the most central project of Knative, as can be seen from the description above. Both the Build and Eventing projects can be used as a supplement to Serving or on their own.

Interaction with the Knative API happens by creating entities using Kubernetes‘ own kubectl CLI. Serving, Build and Eventing each define their own CRDs (Custom Resource Definitions) to map the various functionalities of the respective project. The Serving CRDs are as follows:


A Configuration describes what the corresponding deployment of the application should look like. It specifies a variety of parameters, which include the name of the application, the container image to be used or the maximum number of parallel connections which may be open for one instance of the application. The Configuration essentially describes the application to be deployed and its properties.


As the name already suggests, a Revision represents the state of a Configuration at a specific point in time. A Revision is therefore created from the Configuration. This also means that Revision cannot be modified, while a Configuration very well can. If a user would for example like to deploy a new version of her application, she can update the image. To make this update known to the system, she changes the image of the configuration, which in turn the triggers the creation of a new revision. The user triggers the creation of a new revision with each change that requires a new deployment (such as changing the image or changing the environmental variables) by adapting the configuration. The user never works on the revision itself to make any changes. Whether or not a change requires the creation of a new revision is Serving’s job to figure out.

Several revisions per configuration can be active at a given point in time. The name of a revision corresponds to that of the configuration, followed by a suffix (for example 00001 or 00004), which reflects the number of currently generated revisions.


A route describes how a particular application can be called and the proportion with which the traffic will be distributed across the different revisions. It couples a hostname accessible from the outside with potentially multiple revision in the system.

As already mentioned above, several revisions can be active in the system at any given time. If for example the image of an application is updated, depending on the size and importance of the deployment it would make sense not to immediately release the new version of the application to all users, but rather carry out the change step-by-step. For this purpose, a specific proportion of the traffic can be assigned to the different revisions using the Route.

To implement routes, Serving uses Istio, a so-called service mesh. It makes sure that all requests to applications in the system run through a router, no matter if the request originates from inside or outside the system. That way, a dedicated decision can be made as to where the corresponding request ends up. Since these routers are programmable, the proportional distribution of traffic to the revisions described above is possible.


Things get a little hairy here. In the Kubernetes environment, the term “service” is quite overloaded. Knative is no exception here either and uses “service” as a name for one of its CRDs: A Knative Service describes a combination of a Route and a Configuration. It is a higher-level entity that does not provide any additional functionality. It should in fact make it easier to deploy an application quickly (similar to the creation of a configuration) and make it available (similar to the creation of a route).

During the creation of a service, the entities mentioned above are also automatically created. However, they should not be changed, because changes to the Knative Service itself would override the changes applied directly. If fine-grained flexibility is needed for adjustments and changes, the Configuration and Route should be created manually.

Knative Serving also defines a few more CRDs; these however are used internally in the system and do not play a major role from the point of view of the user.


If the user would now like to deploy an application with a given container image, for example, she will first create a Configuration with that very image. Once the Configuration has been created, the Knative Serving System will generate a revision from the configuration. Here, we bid farewell to Knative Land and make our way to the usual Kubernetes entities. After all, the system must also be able to create containers and scale their number for the final deployment of an application. In Kubernetes, you refer to this as so-called ReplicaSet, a number of replicas. A ReplicaSet creates a specific number of Pods, depending on how many you would like. The Pods in turn contain a number of containers. In this case, the most important is the user container, which is created from the image that the user had specified when creating the Configuration.

Once the Revision is created, a so-called ReplicaSet is thus generated for the Revision, set with an initial scale of 1. This means that at first, at least one Pod is started with a user container. The purpose of this is to determine whether the Revision can be deployed and used as such at all as well as if the Container can even be started.

If everything has been successful so far, you now need to provide access to the Revision from the outside. For this purpose, a Route is created, which assigns a host name to the configuration (and thus indirectly to the Revision and the deployed Pods) using Istio. That way, the application can be accessed and queried via HTTP.

As soon as requests reach the deployed application, metrics are collected per pod. In addition to the user container, you have the so-called queue-proxy which proxies all the requests to the user container and additionally produces the metrics mentioned above. These metrics are sent to an Autoscaler, which based on the incoming number of requests decides how many Pods (and thus user containers) are needed. The algorithm essentially bases this decision on the incoming number of parallel requests and the configured maximum number of parallel requests per container. If for example, 10 parallel requests per container are set in the Configuration (containerConcurrency: 10) and 200 requests reach the application in parallel, the Autoscaler will decide to provide 20 Pods to process the load. Likewise, the number of Pods is reduced as soon as the volume of incoming requests goes down.

If the number of incoming requests goes back to 0, the last Pod with a user container will be deleted after a defined time has passed. Among others, this characteristic referred to as “Scale to Zero” makes Knative serverless. In this state, there is another global proxy called the Activator which accepts all the requests of a Revision that is scaled to 0. Since there are no Pods which can generate metrics for scaling, the Activator generates those very metrics and causes the Revision to be scaled according to the request load once again.

This Autoscaler/Activator team stands out from the usual Kubernetes Horizontal Pod Autoscaler (HPA) in that it allows for this scaling to 0. The HPA works on the basis of metrics such as CPU load or memory usage and performs scaling as soon as these values exceed a certain threshold. However, these values can never reach 0 and there is no way to generate a signal where there are no more Pods. Knative solves this issue with scaling based on HTTP requests, as was already described above. The Activator solves the issue of generating a signal where there are no Pods present anymore.

Serverless Engineering & Operations at the Serverless Architecture Conference

Practical application

There is nothing more helpful in understanding how such a complex system works than showing an example. We will create a small node.js webserver application, containerize it and then deploy and scale it using Knative. It is important to note here that node.js is only one of many possible examples. Using Knative, you can basically deploy and scale any application based on an HTTP interface. Here it is irrelevant whether the interface complies with specific guidelines such as REST, as long as the application follows a request/response model, meaning that a user makes a request via HTTP and, as soon as the corresponding result is calculated, receives it back as a response to the request. This point is important because the automatic scaling in Knative, as described above, is based on the volume of currently active requests. Applications deployed using Knative usually do not need any kind of adjustment.

To execute the steps, a working installation of Node.js [2] and Docker [3], as well as a valid Docker Hub Account [4] is required. In addition, we assume that a Kubernetes with Knative cluster [5] is already available and that kubectl is correctly configured to communicate with that cluster. A detailed description of the various ways to install Knative is provided in the accompanying documentation.

Creating the application

Back to the example. A minimal web server based on Node.js does not need more than a few lines of code at first:

const http = require('http')
const port = process.env.PORT || 8080

const server = http.createServer((_, response) => {
  setTimeout(() => response.end('Hello World!'), 1000)
server.listen(port, () => console.log(`server is listening on ${port}`))

After putting this in a file called index.js, we can start the web server using the following command:

$ node index.js

In this case, the HTTP server will respond after a second with the string Hello World! to a request via HTTP. CTRL + C closes the HTTP server. The fact that the application is listening on port 8080 is no accident here. Knative is expecting this port as the default port for the user container.

Creating the container image

To now be able to deploy this HTTP server on Knative, we need to create a container image which holds the application and its runtime. To build a container image with Docker, we first need a so-called Dockerfile. The file contains instructions that Docker needs to know what to put in the image and how the image should behave later on. For our purposes, a very minimalistic file is quite sufficient here:

FROM node
ADD index.js /

CMD [“node”, “index.js”]

The FROM line specifies that we want to build a Node.js image. Here, the so-called base image contains all the dependencies needed to be able to start a node.js application. Then we add the newly created index.js file to it. Finally, we set the command with which the server is started.

And that‘s it. Using the following commands, we will now create the container image and publish it in a so-called Registry. Here, container images can be made available to allow them to be transported over the network to another machine (our Knative cluster later on). In this simple case, the Registry is Docker Hub. $DOCKERHUB_NAME corresponds to the login name to Docker Hub here.

$ docker build -t "$DOCKERHUB_NAME/knative-example"
$ docker push "$DOCKERHUB_NAME/knative-example"

And that way, we have created the image, given it a meaningful name and published the image under this name.

Creating the Knative Service

Now we get down to business: We will now convert the newly created image into a scalable service using Knative. For the sake of simplicity, we will use the Knative Service described above here, which combines the functionality of Configuration and Route and makes things deployable in a way which is easy to understand. As is usually the case with Kubernetes, a resource is created by applying a YAML file; in our case it looks like Listing 1.

Listing 1
kind: Service
  name: knative-example
  namespace: default

Above all, the spec-area is interesting here: In this position, runLatest defines that the generated Route always points to the latest revision. After each update, the traffic will therefore point to the updated version.

As the name already suggests, configuration contains all the parts needed to create a Configuration. If we were to create a Configuration manually, the now following options would be identical. A configuration generates several revisions – that we have already learned. To generate these Revisions, it uses a so-called revisionTemplate. As you can see in Listing 1, it ultimately includes the container and all the parameters needed to generate a Container (in a Pod). In this very simple case, we will only specify the image. ##DOCKERHUB_NAME## should be replaced with the login name to Docker Hub, as had already occurred when the image was being built. is the host name, of the Registry belonging to Docker Hub. We put the YAML above into a app.yaml file and generate the Knative Service via

$ kubectl apply -f app.yaml created

And that’s it! We can now observe how the different resources and entities are created by the service:

The Configuration:

$ kubectl get configuration
NAME             CREATED AT
knative-example  1h

The Revision:

$ kubectl get revision
NAME                   CREATED AT
knative-example-00001  1h

The Deployment/ReplicaSet:

$ kubectl get replicaset
NAME                                        DESIRED   CURRENT   READY   AGE
knative-example-00001-deployment-d65cfb48d  1         1         1       1h

The Route:

$ kubectl get route
NAME             CREATED AT
knative-example  1h

And last but not least, the Pods in which the user container runs.

$ kubectl get pods
NAME                                              READY  STATUS   RESTARTS  AGE
knative-example-00001-deployment-d65cfb48d-plbrq  3/3    Running  0         5s

To receive a response from our application, as at the beginning of the example, we now need its host name and the IP address of our Knative instance. The host name basically corresponds to the name of the service (knative-example in our case), the namespace in which the service runs (default in the example) and a pre-defined top-level domain ( for a standard installation). The resulting host name should therefore be It can be called in a program as follows:

$ kubectl get route knative-example -ojsonpath="{.status.domain}"

Things will definitely get trickier with the IP address of the Knative cluster, since it will be different in the various deployment topologies. In most cases though, the following should work

$ kubectl get svc knative-ingressgateway -n istio-system -ojsonpath="{.status.loadBalancer.ingress[*].ip}"
Hello World!

Having arrived at this spot, we could now state: Kubernetes can do all of that too. And that is true, as up until now we haven’t seen anything groundbreaking, and we haven’t shown anything that would make Knative stand out from a base Kubernetes. We’ll be coming to that now.

Scaling to 0

As we have already explained above, Knative scales the Pods of a Revision all the way down to 0, if no request is made to the application long enough. To achieve that, we simply wait until the system decides that the resources are no longer needed. In a default setting, this happens if the application does not receive any more requests for 5 minutes.

$ kubectl get pods
No resources found.

The application is now scaled to 0 instances and no longer needs any resources. And this is, as explained at the beginning, what Serverless is really all about: If no resources are needed, then none will be consumed.

Scaling from 0

However, as soon the application is used again, meaning that as soon as a request towards application comes into the system, it is immediately scaled to an appropriate number of pods. We can see that by using the familiar command:

$ curl -H "Host:"
Hello World!

Since scaling needs to occur first and at least one Pod must be created, the requests usually last a bit longer in most cases. Once it has successfully finished, the Pod list looks just like before:

$ kubectl get pods
NAME                                              READY  STATUS   RESTARTS  AGE
knative-example-00001-deployment-d65cfb48d-tzcgh  3/3    Running  0         11s

You can tell by the Pod name that you are looking at a fresh Pod, because it does not match the previous name.

Scaling above 1 and updating a service

We have already explained in detail how scaling works within Knative Serving. Essentialy the incoming parallel request volume is compared to what the application can process in parallel. This information must be provided by the developer of the Knative Service. The default setting is 100 parallel requests (containerConcurrency: 100). To make the effect easier to see, we will decrease this maximum number of parallel requests to 1 (Listing 2)

Listing 2
kind: Service
  name: knative-example
  namespace: default
          containerConcurrency: 1
            image: ##DOCKERHUB_NAME##/knative-example

We apply this change, as is usually the case in Kubernetes, by using the following command

$ kubectl apply -f app.yaml configured

And we can then observe how Knative creates a second Revision and generates Pods for it (as can be seen by the 00001 and 00002 suffix for the respective Revision).

$ kubectl get pods
NAME                                              READY  STATUS   RESTARTS  AGE
knative-example-00001-deployment-d65cfb48d-tzcgh  3/3    Running  0         2m
knative-example-00002-deployment-95dfb8f67-mgtd7  3/3    Running  0         9s

Since the service contains the runLatest setting, all the requests will from this moment on run against the last revision created. We will now trigger 20 parallel requests against the system:

$ for i in `seq 1 20`; do curl -H "Host:" &; done

As we observe the pod list, we’ll see that the application has been scaled accordingly (Listing 3). Warning: Most likely, a scaling of exactly 20 will not be achieved in this example, since Autoscaler is based on the feedback loop and only a sustained load would enable reliable scaling.

Listing 3
$ kubectl get pods
NAME                                              READY  STATUS    RESTARTS  AGE
knative-example-00002-deployment-95dfb8f67-9wz7m  3/3    Running   0         17s
knative-example-00002-deployment-95dfb8f67-fh8vm  3/3    Running   0         12s
knative-example-00002-deployment-95dfb8f67-mgtd7  3/3    Running   0         45s
knative-example-00002-deployment-95dfb8f67-h9t9r  3/3    Running   0         15s
knative-example-00002-deployment-95dfb8f67-hln4j  3/3    Running   0         12s
knative-example-00002-deployment-95dfb8f67-j8s8z  3/3    Running   0         15s
knative-example-00002-deployment-95dfb8f67-lbsgp  3/3    Running   0         17s
knative-example-00002-deployment-95dfb8f67-rx84n  3/3    Running   0         15s
knative-example-00002-deployment-95dfb8f67-tbvk9  3/3    Running   0         12s

In a relatively short time after this small flood of requests, the application will once again be scaled back down to one pod again, to then completely disappear 5 minutes after the last request, as was already described above.


Knative is a project which combines the features of many other projects. It comprises the best practices of frameworks from various areas which specialise in different workloads and combines them into a single, easy-to-use platform. For developers who already use, know and appreciate Kubernetes, Knative as an extension is a solution immediately accessible and understandable. Yet even those developers who have had nothing to do with Kubernetes do not need to understand all the basics at first to be able to use Knative.

We have learned how Knative Serving works in detail, how it achieves the quick scaling it needs, how it implements the features of Serverless, and how you containerise and deploy a service. The Knative project is still very young. Nevertheless, an extensive group of well-known industry giants has come together and is constantly pushing the project forward along with developers. The atmosphere in the community is very open and the members are extremely helpful, so it is worthwhile even for open source newcomers to take a look at the GitHub repositories. If you’re already itching to work on the project: Good entry-level issues are tagged with good-first-issue.

All in all, Knative is a very interesting project and thanks to the investments of many companies known in the industry, it is a platform that should not be ignored.

Stay tuned!
Learn more about Serverless
Architecture Conference 2020

Behind the Tracks

Software Architecture & Design
Software innovation & more
Architecture structure & more
Agile & Communication
Methodologies & more
Emerging Technologies
Everything about the latest technologies
DevOps & Continuous Delivery
Delivery Pipelines, Testing & more
Cloud & Modern Infrastructure
Everything about new tools and platforms
Big Data & Machine Learning
Saving, processing & more