DX is Awesome (DX rework mid 2020)

This RFC is about how we can improve our Developer Experience of how we build, test, deploy and manage applications.

Our initial focus was on the testing, deployment and management of applications ("DevOps"). However, the RFC has expanded to include a proposal for application architecture because we have come to realise that the architecture is key to how we build, test and deploy the applications – both to the problems and to the solutions.

:::tip Terminology: DevOps vs Developer Experience We prefer the term Developer Experience (DX) to DevOps as more encompassing and more accurate. This work isn't just about deployment, its also about you boot a new project and do your work day to day. In addition, the modern approach means continuous everything and testing and deployment are an essential part of a developer's day to day work – not some afterthought or final step. (Though is still often the case!). :::

Principles

Principles (not rules!):

You build it, you run it¹ => DevOps is not a separate team, its an activity we all do
Use existing patterns and technologies (rather than rolling our own) => terraform vs home-rolled, Gitlab for CI, GitOps + Argo for deployment etc
Automate everything => Infrastructure and deployment as code => GitOps, no manual deploys in Jenkins
- CI: everything that has tests should have automated tests that are continuously run
- CD: everything deployable should be deployed in an automatic manner

There's a major implication of the last item: setting up CI and CD have got to be easy otherwise people won't do it :smile:

Developer Experience of what?

When talking about Developer Experience (DX) we need to ask experience of what?

This has two parts: what we are developing and what (or how) we are doing in doing that development – the process of development, if you like.

What are we developing?

What we are developing obviously varies from project to project. At Datopian, a full list would be something like:

Data management solutions (esp data portals)
Data integration solutions
Simple apps or dashboards
Simple static websites

Our focus is on the first two: data portals / data management systems (usually based on CKAN). These are both the majority of our work and the more complex. That said, our approach could apply to all of them.

How are we doing development?

The development of a classic, small web application:

Good agile practice would have us going round the core loop repeatedly and rapidly we want to automate as much of this as possible. In particular, we want to automate the test and deploy stages. This leads us first to automated testing and deployment and then to continuous testing (integration) and deployment: CI and CD.

Analysing a bit further, both testing and deployment usually involve one-off (or infrequent) infrastructure setup, such as a database and a application container; followed by repeated "pushing" of the latest application code onto that infrastructure (and running of it).

Developer Experience defined

Developer Experience means the experience of developers in carrying out their work, specifically in boot

Domain Model

Instance: a deployed instance of a solution e.g. "the Staging instance of the XYZ data portal"
Application: a complete solution (deployed or ready to deploy) "the XYZ data portal"
Framework: a combination of services that provide
Service: an individual service (ie. something runnable) (we could also include CKAN extensions even though they are not strictly runnable on their own atm)

NB: these distinction is important because this is often confused in discussing CKAN today. For example, we say things like:

"this is CKAN" pointing at a website (i.e. an application)
"this is CKAN" pointing at a specific code base including multiple extensions (a framework)
"this is CKAN" pointing at github.com/ckan/ckan

Originally, there was no need for this distinction because application, framework and service were one: CKAN was a single monolithic application (strictly, the DB was separated but we can ignore that). However, now this distinction is important. Applications are built "on CKAN" or "with CKAN" and may include more than just CKAN (e.g. content from a CMS or data from multiple different subsystems). And CKAN itself is a rich framework with key functionality in extensions rather than a single monolithic codebase.

Going forward, I think we could use CKAN solely for the framework and say things like:

"This is a CKAN-based application" for applications
"CKAN is a framework" or more elaborately: "CKAN is a powerful framework for building enterprise-grade data infrastructure especially data portals and metadata catalogs"
"This is a CKAN service" for component services

The current "core" CKAN would just be referred to as the "CKAN core service".

RFC draft separate (may) - TODO Merge

This RFC needs to contain two visions:

Develper Experience: Why these changes are good for CKAN developers
Product Experience: Why these changes are good to maintain and expand CKAN's presence in data portal and metadata management market.

References

https://tech.datopian.com/ckan-v3/
SCQH for DX is awesome (focused on DX): https://docs.google.com/document/d/1n3nquQIkfI81onGwYe31_UENhSxE6-TYRQXX6XjIiQU/edit

Need

Developer experience is sub-optimal and this relates to the monolithic nature of CKAN and esp extensions.

For Developer Experience

Today CKAN is monolithic and the way you modify it is with extensions that run in process which means that testing and CI is painful (for extensions) as you need to run the whole of CKAN to test the extension and CD is also painful/complex because you can’t configure and deploy a “service” (aka extension) on its own and we can’t hot reload individual services (extensions) and the overall impact is that it is hard and slow to onboard developers, much development work (on extensions) is untested leading to poor quality code with bugs found late (e.g. in deployment) which is expensive and subsystems are wired together (e.g. datapusher and main CKAN) and we can’t scale individual services; all of which makes us less efficient, agile and innovative.

For the Product

Theming can be done more rapidly, more easily and with greater flexibility and ability to use best of breed tooling

Approach

The main way to address these problems while gaining extra benefits is to move to a microservices-based architecture. Thus, we recommend building the next version of CKAN – CKAN v3 – on a microservices approach.

With microservices, each piece of functionality runs in its own service and process.

Challenges

Monitoring

It's much harder to monitor multiple interconnected microservices than one single monolith.

Logging

It get's increasingly complicated, as there are more things that can fail between the communication between one function and another.

Debugability

Same as before.

Syncing releases in multiple projects

It's harder when you can't do it in a single branch or commit.

Complexity

It's harder to understand the whole system.

Testing is not always easier

There's an extra effort for generating fixtures for external services, and for doing integration testing.

Added penalty in the use of resources

It's more cost effective to run two functions in a monolith than in different services.

Local Development

While it is easier to run a single service, it's harder to run the complete system (multiple services) in a single machine.

Larger attack surface

It gets harder to maintain a good level of security.

Existing CKAN extensions

What happens with all the existing extensions?

Extra effort in tooling

We need to dedicate time to improve tooling e.g., how to start a new service, how to test the interface with other services, how to start multiple services at the same time.

Extra complexity

In a single application/service, there is only one viable way of communicating with different parts of it: internally. In 4 services, there are six. In 5 services, 10, etc.

Benefits

For Developer Experience

Better Developer Experience
Innovability
Scability
Less coupling
Stability
Possibility of using different languages
Extension usability and testing
Easier to test
Vertical scalability

For the Product

???

Alternatives

Do Nothing

???

Werner Vogels (2006) "Giving developers operational responsibilities has greatly enhanced the quality of the services, both from a customer and a technology point of view. The traditional model is that you take your software to the wall that separates development and operations and throw it over and then forget about it. Not at Amazon. You build it, you run it. This brings developers into contact with the day-to-day operation of their software. It also brings them into day-to-day contact with the customer. This customer feedback loop is essential for improving the quality of the service." https://queue.acm.org/detail.cfm?id=1142065 ↩

DX is Awesome (DX rework mid 2020)

Principles

Developer Experience of what?

What are we developing?

How are we doing development?

Developer Experience defined

Domain Model

RFC draft separate (may) - TODO Merge

References

Need

For Developer Experience

For the Product

Approach

Challenges

Monitoring

Logging

Debugability

Syncing releases in multiple projects

Complexity

Testing is not always easier

Added penalty in the use of resources

Local Development

Larger attack surface

Existing CKAN extensions

Extra effort in tooling

Extra complexity

Benefits

For Developer Experience

For the Product

Alternatives

Do Nothing

Footnotes