The landscape of IT is going through some extraordinary changes and technology is beginning to permeate every aspect of our lives. With the near complete saturation of internet connected devices across the world, the size and scale of potential consumers has increased massively. Applications built today have the potential to reach billions of consumers globally. A strong product, given the right exposure, can grow from hundreds to millions of users nearly overnight. For example, if a product gets exposure on a site like Reddit, the load can spike within minutes and crash the site (known as the Reddit “Hug of Death”). While access to this massive global market provides amazing opportunities, there’s also a potential risk: if the application fails to scale to meet demand, even existing consumers will be lost. So how to do you prepare an application to scale from hundreds to millions of users? If you’re targeting a larger market and looking to scale, there’s some key ways to address the problem both at a technological and organisational level.
Gathering a Baseline
Before doing any other changes, it’s worth establishing a baseline by creating a performance testing harness to gather throughput, response times, network stats, memory and CPU usage, load statistics and patterns, etc. while the system is under load. With this in place, you’ll be able to do a few different things:
- Measure under load – gather as much information on the system behavior under load and use it to understand where improvements can be made.
- Find the breakpoints – push the system to the point where it breaks to discover more precisely what your system can handle and when and how it breaks.
Verify effectiveness – with a baseline recorded, re-running the load testing after changes will give an indication of the effectiveness of any optimisation.
Further down there are some technological and architectural aspects to consider, but before that, it’s worth considering how to support the scaling of an application.
- Understand the traffic – gather as much information on the actual or expected traffic, such as geographical origin, the end user device (i.e. mobility) and the nature of the data (dynamic data vs. static content). These sorts of insights can be used as input to the technological optimization.
- Performance testing capabilities – in order to support the above, there needs to be the capacity to provide regular performance testing on the system. Organisations looking to scale will need to either bring the capability inhouse or establish a partnership with a provider that can.
- Support and maintenance structures – small applications can be supported by small teams; however, as the scale of the application grows, make sure that you have the right resources in place to support the increase.
- Capacity planning – solutions won’t always need to scale to the full volume overnight, and it may be more prudent to scale the application progressively. To do this, systems will need to have an accurate view of the capacity, as well as the projected trends, to ensure you’re proactively increasing the capacity well before it’s required, not after.
To ensure that an application can scale, while still maintaining acceptable throughput and response times, applications will need to be assessed at a technological level.
- Optimise the code – at a technical level, the obvious answer is to optimise the application code. Techniques will vary by the architecture of the application, but with the right metrics and timing points in the code, you’ll be able to target precisely where the bottlenecks are. When you’re running at load, even small inefficiencies can have an outsized impact on the capacity of the system. Bear in mind, this is potentially a double-edged sword: sometimes code optimisation can be an arduous task with unforeseen consequences, and there may be more effective ways to scale first.
- Application architecture – many architectural decisions may impose significant scalability constraints. For low volume scenarios, decisions such as the persistence technology are often made for maintainability and agility considerations, rather than performance and scale. You’ll need to reconsider many decisions carefully to ensure you have the right architecture in place to scale the system.
As an example, a lot of value can often be achieved with little effort by introducing caching over the persistence layer. While this will create another layer of complexity and some data integrity concerns, it’s often a good way to get more out of a system. It’s also worth considering ‘sharding’ techniques, and other ways of improving persistence performance. I’m also aware, however, that it’s not always possible or feasible to re-architect a solution.
- Infrastructure scaling – increasing the compute or memory capacity of the infrastructure can be a quick win as additional hardware is usually relatively cheap. There will, however, be limitations to vertical scalability of the infrastructure, and you’ll need to look at scaling horizontally with more instances. As an aside, this is one of the strengths of a microservices architecture in that it’s inherently designed for horizontal scaling.
- Hosting options – the rise of cloud solutions and IaaS (Infrastructure As A Service) capabilities means that there is often very little reason to keep an application on premise. IaaS cloud providers such as Amazon AWS and Microsoft Azure provide a level of scalability and redundancy that often makes a lot more economic sense than an onpremise solution. Understanding the geographic profile of the expected traffic can be useful: it may be worth hosting some or the entire infrastructure in different geographical regions. Online gaming platforms often take this approach and introduce regional servers (League of Legends, for example, has 100 million monthly players spread across 10+ regions). In the Australian context, given the tendency for undersea cables to be damaged by sharks and typhoons, hosting locally for an international user base (or vice versa) can be problematic and geographically segregating data may make sense.
Warnings and risks
Nothing is without risks, and there are some things you’ll want to consider before jumping into systems optimisations:
- Don’t compromise quality for scale – make sure that any optimisations don’t compromise the quality; too many bugs and defects are a surefire way to lose the faith of your customers.
- Don’t neglect what made the application popular – make sure you understand what’s working well before you try to optimize the system. A small number of happy and engaged consumers can be worth more than many largely anonymous users who quickly abandon the application.
- Don’t prematurely optimize – this is particularly true early in the lifecycle of the application: it’s often better to focus on delivering the capability of the system before refactoring for scalability.
The above is some general advice on how to scale an application effectively. These suggestions should be considered as guidelines: each application is different and there’s no “one size fits all” solution. Creating and maintaining applications for hundreds of users is very different to having millions of users, and it’s definitely worth engaging an external party to work alongside you to establish the best way of scaling your solution.