Hive has become one of the big successes of Centrica’s British Gas business, establishing the company as a viable alternative to Google’s Nest.
But being at the forefront of smart home technology means Hive requires a 24 by 7 way of working and an approach to software development that ensures there are no incidents on the back-end software platform on which Hive runs, while giving developers the freedom to create, build and deploy new features quickly and efficiently.
It starts with DevOps, but monitoring has become a key aspect of the DevOps process, and developers are expected to take full responsibility for the code they push into production.
“The challenge with Hive is that we are in quite an innovative space,” says Chris Livingston, head of cyber reliability engineering team for Hive Home at Centrica. “We know what good looks like and have a very clear idea of things we want to do and things we don’t want to do. But as we innovate there is a grey area in the middle.
“There is no up-front approval process at Hive,” he says. “Instead, the developer teams are provided with a set of guard rails that give our developers a lot of freedom, so long as they are doing everything right. We have a lot of continual compliance,”
As an example, Hive runs a million compliance checks an hour. For Livingston, monitoring is a joint responsibility. “The only people who know if their software is working are the people who wrote the code,” he says. “They have to make sure they send the right data to the monitoring system and they set the right thresholds.
“More and more people are running 24×7 services. The days of turning up to work at 9 o’clock and going how at 5:30 are a rarity. In my job, I work 24 by 7. If there is an issue with the system out of hours there is an expectation we fix it.”
Livingston’s role is to run all of the infrastructure that keeps Hive running. He says this involves supporting all the teams developing for the Hive platform. “My job is to give the developers an environment where they can focus on their code.”
“We are very much trying to empower our developers to be responsible for the software and the services we develop. We want the developer teams to be 100% focused on delivering value and features to the customer.”
This involves providing an environment for developers to build, test and deploy the code they create. “I worry about monitoring, log aggregation, security and compliance,” says Livingston.
The cyber reliability engineering team provides a set of tools to support developer teams. He says the developer teams are “absolutely responsible” for monitoring the software they produce. “When there is a problem with their software out of hours, they are on call to fix it.”
The company is a big user of VMware’s real time cloud monitoring tool, Wavefront, and also uses CloudHealth, a cloud cost management product that VMware has announced it will be acquiring, He says Wavefront has transformed the way the Hive platform is monitored.
“We define an incident as the software not doing what it is supposed to,” he says. “Sometimes, we can correct an incident before it becomes a problem, which is why Wavefront is useful.” If the system monitoring is trending in a way that could lead to an incident, the problem can be fixed before any issues arise, according to Livingston.
The entire end-to-end infrastructure on which the Hive Platform is based—including marketing and support websites, data collection services, and the real-time store for user and analytics data—runs on AWS technologies. “We’ve been in the AWS cloud from Day One,” says Livingston. The core technologies used to power Hive are Amazon Elastic Cloud Compute (Amazon EC2), Amazon Relational Database Service (Amazon RDS), and Amazon Simple Storage Service (Amazon S3).
A choice between private and public cloud
According to Livingston, up until now, businesses needed to make a choice between using a private or public cloud. He says: “Our developers don’t have to care about where their code runs. But having seen a VMware orchestration on top of AWS demonstration at VMworld in Barcelona, he says: “I can see a huge benefit, because you no longer have to chose [to deploy just] on-premise.
“As I look at all the products bridging physical on-prem and hybrid clouds, it is really powerful not to have to worry where your workloads are. You can have the best of both worlds and leverage all your legacy investments.”
Given that pretty much 100% of Hive runs on AWS, Livingston says: “We take a proactive view of cost management. For instance, the company uses a system that analyses AWS spending on a daily basis, which points out spending anomalies.”
He adds that the cyber reliability engineering team’s role is not to become a blocker. “I am trying to provide a set of tooling that enable developers do their work.”
However, there still needs to be some form of process. “I’m not a fan of process for process sake. But I believe good process can empower a business.”
He works with the developer to teams to create a process that works both for the teams and for the business. This means developers can deploy their own code. “We don’t work in a more traditional environment where someone else deploys code. Our developers have access to their production environment to deploy their code.”