High-Availability at Massive Scale: Building Google’s Data Infrastructure for Ads
نویسندگان
چکیده
Google’s Ads Data Infrastructure systems run the multibillion dollar ads business at Google. High availability and strong consistency are critical for these systems. While most distributed systems handle machine-level failures well, handling datacenter-level failures is less common. In our experience, handling datacenter-level failures is critical for running true high availability systems. Most of our systems (e.g. Photon, F1, Mesa) now support multi-homing as a fundamental design property. Multi-homed systems run live in multiple datacenters all the time, adaptively moving load between datacenters, with the ability to handle outages of any scale completely transparently. This paper focuses primarily on stream processing systems, and describes our general approaches for building high availability multi-homed systems, discusses common challenges and solutions, and shares what we have learned in building and running these large-scale systems for over ten years.
منابع مشابه
Managing Google's data lake: an overview of the Goods system
For most large enterprises today, data constitutes their core asset, along with code and infrastructure. For most enterprises, the amount of data that they produce internally has exploded in recent years. At the same time, in many cases, engineers and data scientists do not use centralized data-management systems and end up creating what became known as a data lake—a collection of datasets that...
متن کاملA scalable infrastructure for CMS data analysis based on OpenStack Cloud and Gluster file system
The challenge of providing a resilient and scalable computational and data management solution for massive scale research environments requires continuous exploration of new technologies and techniques. In this project the aim has been to design a scalable and resilient infrastructure for CERN HEP data analysis. The infrastructure is based on OpenStack components for structuring a private Cloud...
متن کاملDatacenters as Computers: Google Engineering & Database Research Perspectives
In this collaborative keynote address, we will share Google’s experience in building a scalable data infrastructure that leverages datacenters for managing Google’s advertising data over the last decade. In order to support the massive online advertising platform at Google, the data infrastructure must simultaneously support both transactional and analytical workloads. The focus of this talk wi...
متن کاملAn empirical investigation on search engine ad disclosure
This representative study of German search engine users (N=1,000) focuses on the ability of users to distinguish between organic results and advertisements on Google results pages. We combine questions about Google’s business with task-based studies in which users were asked to distinguish between ads and organic results in screenshots of results pages. We find that only a small percentage of u...
متن کاملSLOD-BI: An Open Data Infrastructure for Enabling Social Business Intelligence
The tremendous popularity of web-based social media is attracting the attention of the industry to take profit from the massive availability of sentiment data, which is considered of a high value for Business Intelligence (BI). So far, BI has been mainly concerned with corporate data with little or null attention to the external world. However, for BI analysts, taking into account the Voice of ...
متن کامل