Background
The world of end user compute (EUC) can be complex and often challenging for IT and Operations teams whose mission it is to continuously evolve and support complex, user-facing capabilities. Environments often comprise many dependencies and can be bound by the constraints of legacy technologies or historic architectural decisions. Frequently, this leads to an environment where teams are forced to deploy and manage workloads using complicated and lengthy manual processes. This can lead to lower end quality, a slow update deployment cadence, a high development and operating cost, all of which are underpinned by poor communications and collaboration between the IT development and Operations teams. These are problems that the software development domain often encountered, but are now in many respects, part of a bygone era. So, what can we learn and apply to the development and operation of EUC from the ways of working that the software development domain has adapted and matured in their quest to solve these issues?
At HM3 we were recently asked this very question by one of our major clients. Our client operates a vast and complex EUC estate with many users working in different geographical locations, across a variety of platforms, and they are faced with the exact challenges that we have described. They are on a multi-year transformation journey to move applications and services away from on-premise hosting to a top-tier Cloud where it makes sense. Operating models will be transformed in order to maximise the opportunities presented by a new environment with strategic objectives to reduce waste, increase operational velocity and efficiency. We were set the challenge to identify and investigate ways to adopt a DevOps approach in the target state for development and operation of the EUC services. The simple goal being to introduce a new way of working to help realise their objectives at pace. In this blog we journal the investigation work that we undertook for our client outlining the approach and conclude with what was learned and played back to the client.
What is DevOps to EUC?
Firstly, we need to be clear about what DevOps means to us, since it is so often an over-loaded and misunderstood term. DevOps is explained well by AWS in their article What is DevOps? However, at a basic level, it is a union of cultural philosophies, practices and tooling that brings the typically siloed disciplines of development and IT operations teams more tightly together to work in unison towards the same goals. It focusses on giving teams autonomy, automating by default, and promoting cross-team communications and collaborations. This approach seeks to increase team velocity for feature release to customers, increase team efficiency, provide higher reliability and quality of products, increase security, which should all hopefully lead to a much happier end user.
What we did
Unsurprisingly, promoting DevOps practices for EUC, which are so well-known for the benefits they have brought to the software development domain, was met with some skepticism by some long-standing members of the EUC team. After all, what has software development got to do with development and operation of EUC services right? Well, we believe that, although there are clear differences in many areas between the two disciplines, there are some very relevant areas where the same challenges apply, and DevOps can help if approached in the right way. For that reason, our time-boxed client investigation focused on evaluating the DevOps practices where we felt that they offer the maximum bang-for-buck to our EUC challenges. These were:
· Automation for provisioning core EUC infrastructure and deployment of workloads.
· Continuous Integration and Continuous Delivery (CI/CD) for automating the building, testing and deployment of EUC infrastructure and services.
· Collaboration and Communication through assembling a cross-functional team of architects, infrastructure engineers, operations specialists and security experts together under a shared responsibility model.
Our approach was to test our hypothesis by building a minimal viable product (MVP) in a rapid, fail-fast fashion so that the team could learn by doing and we could evaluate against real business problems. We agreed our MVP goal was to take an EUC workload through to live deployment in an automated way. Simply, this meant specifying and building our EUC environments (development, reference, production) and core infrastructure using infrastructure as code (IaC) and using GitLab CI/CD pipelines to configure, build, test and deploy both the infrastructure and the workload. This was done time-boxed to a period of a few weeks to allow us to get just enough in place to evaluate what we have learned and where we would go next.
We started small using a representative but straightforward and low complexity workload, a Commercial Off The Shelf (COTS) application used by the client’s IT teams. We analysed the workload’s lifecycle as it is today and mapped its value stream in entirety. Value stream mapping is a critical early job, as it flushes out every single step and action that is required to take a workload (service, product, capability etc) from the initial customer request right through to the realisation of value by the customer. In simple terms, this means mapping every bit of work that is required to take something from undeployed to deployed and in-service with users in the production environment.
The value-stream mapping exposed a shocking reality for our client IT leadership team. For our simple workload there were many preparation, development and deployment steps, the majority of which were manual, repetitive and error-prone, with some having dependencies on overburdened IT operations teams external to our development team. It required infrastructure to be built to host the application, with many steps taking a significant amount of time to complete by following manual build and installation guides. The documentation set was large, and commonly there were gaps. The team needed to wait for the external IT operations teams to complete tasks on which they were dependent resulting in a huge amount of wasted time ‘blocked-waiting’.
Infrastructure engineers were required to go back and repeat a series of steps due to errors being made in the manual steps and the errors not being spotted until all the steps had been completed - again a huge amount of wasted time and effort. The quality and security engineering SMEs were also under-utilised for most of these foundational steps and their skills only came into play at the end of the process and after the architects and infrastructure engineers had done most of the work to setup the environment and deploy the workload to it. When these specialists identified an issue, it was late in the process, and all the deployment steps would need to be revisited again in order to rectify the issue - more wasted time and effort. The IT leadership team were able to visually identify why everything took so long and why end products were often not of the highest quality on ‘go-live’. We now had a good understanding of the problem, and this enabled us to quickly identify the steps which should be automated both in terms of core infrastructure provisioning, application packaging and deployment and for quality assurance and compliance testing.
The team swarmed to design patterns for the core infrastructure. Within a short period of time in comparison to the manual processes, we had the core infrastructure specified in IaC and under version control in the DevOps platform. The team worked together and sometimes in pairs to codify the
infrastructure; the power of the GitLab DevOps platform allowed this parallel collaboration to happen without complexity. As they forged ahead, a Continuous Delivery (CD) pipeline was designed and built in the DevOps platform. They realised that Continuous Integration (CI) steps weren’t required as the teams’ workloads were normally all COTS applications as opposed to being software code components that required integration before deployment, but that was just fine.
The cross-functional team now comprised the quality, operational and security specialists who worked in pairs with the infrastructure engineers. Early opportunities were exploited to codify security checks against industry and internal standards. Out of the box tools were enabled in the DevOps platform to perform security scans and inspections against industry standards within the pipeline. Functional and acceptance tests were written alongside the code to test the stack as it was incrementally built in the pipeline. This resulted in instant value return as we now had quality and security being designed in from the start rather than as an afterthought. Moreover, we had dramatically shortened the feedback loop from weeks to seconds, and this had happened through simple team organisation changes without much thinking! The team found security vulnerabilities in the core infrastructure stack as the IaC was being written and at preparation and deployment time, and the security specialist set about writing auto-remediation policies to plug them before we were anywhere near production. The team instantly swarmed to fix these issues before moving onto the next step – security and quality shifted as far left as we could get it.
What was being born here was the DevOps shared responsibility culture of enhanced collaboration without anyone in the team really realising it. Issues being exposed early in the process and being made highly visible alongside having the normally ‘external’ quality and security stakeholders being part of the team, simply focused minds. The team were concerned with doing the job right as a whole rather than just shipping something that would be someone else’s problem down the line when it was thrown over the wall and found to be unreliable and not compliant. We now had a complex, core infrastructure written as IaC, under version control, with built in automated functional and security compliance checks. This was being deployed not through lengthy, highly documented, manual processes, but automatically by the Continuous Delivery pipeline as and when we change the IaC and committed it back to the repository. So why did our client care about this? Simple, we had eliminated most of the need to ever have our valuable and highly constrained engineering resources go through these time-consuming manual steps ever again. We had built a repository of highly valuable core infrastructure components that could be reused over and over again that come with a guaranteed level of quality. Just this alone would increase the quality when deploying new workloads, but it would also allow us to do it at a much higher velocity freeing those valuable resources up to concentrate on the bespoke deployment elements for the different workloads.
We had also created the CD pipeline patterns that could be copied, pasted and enhanced for different workload types. This would enable a new workload to be stood up and deployed out, along with its core infra to many environments at a much higher pace. Finally, we had demonstrated a win with an instant culture change. Our operations, security and quality specialists were now a team alongside the architects and infrastructure engineers, and they were working in harmony to solve problems together rather than creating friction. Mostly importantly, the client’s leadership team could feel the excitement, enthusiasm and buzz that the whole team had generated being part of this MVP, and it was clear that this was a much better way of working from a collaboration point of view.
In this short MVP cycle, we proved the hypothesis that basic DevOps principles can be applied to the development and deployment of EUC infrastructure and workloads with great advantage. DevOps in EUC can reduce cycle time, shorten feedback loops, enhance quality and remove repetitive, manual steps through more automation. However, we think the biggest win was the coming together of IT development and Operations specialists, through seemingly transparent and simple adjustments to the teams’ structure and culture. The buzz and new energy where everyone worked together to get the job done was infectious. This short MVP opened our clients’ eyes, and they are now undertaking a more widespread DevSecOps transformation to put these learnings into real life.