Facebook

==== 2012 architecture ====
Facebook is developed as one monolithic application. According to an interview in 2012 with Facebook build engineer Chuck Rossi, Facebook compiles into a 1.5&nbsp;GB binary blob which is then distributed to the servers using a custom [[BitTorrent]]-based release system. Rossi stated that it takes about 15 minutes to build and 15 minutes to release to the servers. The build and release process has zero downtime. Changes to Facebook are rolled out daily.<ref name="Ars Technica BTS">{{cite web|first=Ryan|last=Paul|title=Exclusive: a behind-the-scenes look at Facebook release engineering|url=https://arstechnica.com/business/2012/04/exclusive-a-behind-the-scenes-look-at-facebook-release-engineering/|website=[[Ars Technica]]|publisher=[[Condé Nast]]|date=April 5, 2012|access-date=June 14, 2017|archive-date=July 4, 2017|archive-url=https://web.archive.org/web/20170704145032/https://arstechnica.com/business/2012/04/exclusive-a-behind-the-scenes-look-at-facebook-release-engineering/|url-status=live}}</ref>

Facebook used a combination platform based on [[HBase]] to store data across distributed machines. Using a tailing architecture, events are stored in log files, and the logs are tailed. The system rolls these events up and writes them to storage. The user interface then pulls the data out and displays it to users. Facebook handles requests as [[Ajax (programming)|AJAX]] behavior. These requests are written to a log file using [[Scribe (log server)|Scribe]] (developed by Facebook).<ref name="high scalability">{{cite web|url=http://highscalability.com/blog/2011/3/22/facebooks-new-realtime-analytics-system-hbase-to-process-20.html|title=Facebook's New Real-time Analytics System: HBase To Process 20 Billion Events Per Day|publisher=Highscalability.com|date=March 22, 2011|access-date=December 26, 2012|archive-date=January 26, 2021|archive-url=https://web.archive.org/web/20210126214712/http://highscalability.com/blog/2011/3/22/facebooks-new-realtime-analytics-system-hbase-to-process-20.html|url-status=live}}</ref>

Data is read from these log files using Ptail, an internally built tool to aggregate data from multiple Scribe stores. It tails the log files and pulls data out. Ptail data are separated into three streams and sent to clusters in different [[data center]]s (Plugin impression, News feed impressions, Actions (plugin + news feed)). Puma is used to manage periods of high data flow (Input/Output or IO). Data is processed in batches to lessen the number of times needed to read and write under high demand periods. (A hot article generates many impressions and news feed impressions that cause huge data skews.) Batches are taken every 1.5 seconds, limited by memory used when creating a [[hash table]].<ref name="high scalability" />

Data is then output in PHP format. The backend is written in [[Java (programming language)|Java]]. Thrift is used as the messaging format so PHP programs can query Java services. Caching solutions display pages more quickly. The data is then sent to [[MapReduce]] servers where it is queried via Hive. This serves as a backup as the data can be recovered from Hive.<ref name="high scalability" />