SkyScanner: Scaling a digital business through a Cloud Native architecture on AWS
The feature video describes SkyScanner and how they scale their business through a Cloud Native architecture.
Paul Gillespie from Skyscanner shows how they build their Kubernetes clusters with 100% Amazon EC2 Spot instances. What’s the secret to making this reliable at scale? Diversification—multiple Availability Zones, multiple Regions, multiple clusters, and multiple Amazon EC2 instance types. You’ll learn how they leverage EC2 Spot, Auto Scaling, ELB and some novel design patterns to make their Kubernetes clusters both cost-effective and highly-available.
At 0:23, Mr. Paul Gillespie, the Senior Principal of Skyscanner begins to describe what business SkyScanner is and their focus, explaining that they are a travel search company based in the UK.
He adds that SkyScanner has around 80 million unique users. At 0:30, the host, Matt Yanchyshyn, Director of Solution Architecture for AWS, questions Paul about how the infrastructure is maintained at Skyscanner, adding that infrastructure is vital for large scale applications like SkyScanner since it had to deal with high volumes of queries per second at a time.
Kubernetes on EC2 instances
At 1:17, Paul begins to explain why the Kubernetes is made to run on the EC2 instance in their Skyscanner infrastructure, instead of opting for other approaches.
He further adds that Kubernetes cluster is the ultimate portion of the infrastructure involved in SkyScanner. At 1:59, he explains the presence of ASG (Auto Scaling Groups) and the availability zones. He states that the individual ESG gets deployed across multiple instances. In Skyscanner, 5 ASG’s are present which eventually provides the required diversification.
At 3:03 he explains how well the scalability is taken care of at SkyScanner with the help of this kind of infrastructure. The presence of potential 120-130 nodes helps in catering to many users’ needs. At 3:24, he begins to explain the type and ways of how the queries are dealt with.
The single cluster in the busy regions will generally be between 60 to 70 thousand queries per second. At 4:44, he explains how the presence of a hundred percent spot technology for the running of Kubernetes cluster. At 5:02, Paul states that the shelter script will prevent the Kubernetes scheduler terminating the spot.
The presence of Auto Scaling Groups
At 6:08, Paul explains that Kubernetes must be notified about the presence of the nodes and the nodes that are currently unavailable as well.
At 6:27 he adds that the sharp proxy and the sensations run on all the nodes. He further adds that the team at SkyScanner encountered the problem with the cost of the scaler while building the infrastructure. There was an issue from the auto-scaling when the single ASG reached zero. The clusters had to be restarted often. At 7:16 he states that this problem was resolved using the Londo patch.
At 7:37, Paul states that the reserved instances are used mainly because of the diversification, their cost and to build the multiple clusters efficiently. The reserved instances also allow the use of different instance types.
At 8:15 he states that the different workloads get considered to ensure that all test case queries get passed successfully. Paul, being a part of the infrastructure team at Skyscanner explains the importance of achieving scalability with minimal cost. He puts forward the idea of reserved instances and patch to achieve efficient Kubenertes cluster functionality.