It is not uncommon for services in PayPal to cover 1000 VMs or more. These services make use of very small VMs and produce very low throughput for each VM. At the same time, the large number of nodes takes a toll on the network and routing infrastructure. Several of these services are interconnected into a complicated mesh, making a user request travel through many network hops. As the number of these services adds up, latency gradually increases and the user experience deteriorates.
While it is good for a service to have a critical mass of VMs spread across many data centers for redundancy, additional VMs beyond the critical mass have diminishing returns. There is an inherent cost to too many services spanning hundreds of VMs, in terms of management and monitoring, ineffective caching, but more importantly in terms of agility. It may take from a few minutes, up to an hour to roll out a new version of the service across 100 VMs. It takes ten times longer to roll out 1000 VMs.
read more here
Tuesday, August 16, 2016