Facebook Thrift
Facebook Thrift
Facebook Thrift
Prepared by :
SAVARA GOVIND
Roll. No.
Class
Year
: 2015-16
Guided by :
U12CO086
CERTIFICATE
GUIDE
JURY(s)
HOD
COED
Assistant professor
Table of Contents
1
Introduction
1.1 Facebook Thrift
1.2 Organisation of the Report
1
1
1
3
3
3
4
4
4
Thrift Architecture
3.1 Architecture
3.2 Supported protocols, transports and servers
3.3 Advantages of Thrift
5
5
6
7
8
8
8
9
10
11
Challenges of FBThrift
6.1 Evolving the architecture
6.2 Re-open-sourcing Thrift as FBThrift
14
14
16
Conclusion
18
References
19
Acknowledgement
20
ii
List of Tables
Table 1
10
Table 2
10
Table 3
11
Table 4
11
Table 5
11
Table 6
13
iii
List of Figures
Fig. 1
Thrift architecture
Fig. 2
10
Fig. 3
12
Fig. 4
12
Fig. 5
15
Fig. 6
16
iv
Nomenclature
HHVM HipHop virtual machine
IDL Interface definition logic file
I/O input/output
JSON JavaScript Object Notation
LAMP Acronym for Linux, Apache, MySql and PHP framework
OOP Object-oriented Programming
REST Representational State Transfer
RMI Remote Method Invocation
RPC Remote Procedure Call
SOAP Simple Object Access Protocol
STL Standard template library
TCP Transmission Control Protocol
XML Extensible Markup Language
ABSTRACT
Facebook, a popular social networking site emphasis on choosing the best tools and
implementations for backend services, irrespective of programming languages. Facebook
Thrift is a framework used to generate a program using single language to communicate easily
and efficiently with many other programming languages including C++, Java, Python, PHP
and many more. The individual programming languages are developed for some particular
property enhancement and they contain some special functions and methods that are efficient.
Thrift uses all these efficient methods and function from different programming languages to
make a strong, reliable technology for the development of software products. Facebook is using
thrift internally to develop many of the features of social networking site including the News
feed which provides updates on users status and its search engine. This report briefly discusses
the architecture, applications, services provided by the thrift and challenge faced by this
technology.
Keywords: Facebook Thrift, Remote Procedure Call, Cross-language Development
Environment, Apache Thrift.
vi
Chapter 1 Introduction
Facebook, a popular social network used throughout the globe. The statistical study
shows there are 968 million active users visit the site on daily basis. It is an important task to
present and retrieve the data efficiently for the individual accounts of the user. Hence for the
backend development they choose the best and efficient tools and implementations from
different programming languages. Various programming languages are used to optimize for the
right combination of performance, ease and speed of development, availability of libraries and
so on.
For the backend development at Facebook, when it started, LAMP framework is used
[1]. LAMP is the acronym for Linux, Apache, MySql and PHP. With the increase in the number
of users, the network traffic grew giving rise to the need for scaling its network structure for
many of its onsite applications like, search, ad selection and delivery, event logging and so on.
Scaling these operations to match the resource demands was not possible within the LAMP
framework. To handle the resource demand problems for many of Facebooks onsite
applications, in 2006 a cross-language framework is developed at Facebook known as
Facebook Thrift. In order to foster great use, Facebook Thrift is open-sourced under Apache
license 2007. This is also known as FB Thrift or Apache Thrift.
1.1 Facebook Thrift
Thrift is a software library and set of code-generation tool for development and
implementation of scalable and efficient backend service. The primary goal of thrift is enable
efficient and reliable communication across programming languages by abstracting the portion
of each language that tend to require the most customization into a common library that is
implemented in each language. This is done by allowing the users to define the data types and
service interfaces in a common Interface Definition Logic file (IDL file), which is a language
neutral file and it generates all the necessary code to build Remote Procedure Calls (RPC) to
clients and servers.
1.2 Organisation of the Report
Chapter 1 discusses briefly about the new cross-language service technology Facebook Thrift.
Chapter 2 discusses the salient feature of Thrift.
1
Chapter 3 discusses briefly about the architecture, supported protocols, transports and servers
and the advantages of FBThrift.
Chapter 4 discusses the two major services provided by FBThrift that are Search and Logging.
Chapter 5 shows the usefulness of thrift over other service technologies like REST, RIM and
Protocol Buffers by the result of comparison drawn by Andrew Prunicki, a Senior Software
Engineer at Object Computing, Inc. (OCI).
Chapter 6 focuses on the challenges or problems faced by FBThrift and the expected feature
developments.
Chapter 7 is the concluding notes on the FBThrift.
able to run against TCP Stream Sockets, raw data in memory or files on disk. The transport
interface is designed to support easy extension using common OOP techniques, such as
composition.
2.3 Protocol
Data types must have some way of using the transport layer to encode and decode
themselves. Again the application developer need not to concern about this layer. Whether the
service uses XML or binary protocol is immaterial to the application code. The data should be
able to read or write in consistent and deterministic manner.
2.4 Versioning
For robust services, the involved data types should evolve from the present version.
More precisely, there should be a possible way to add or remove fields in an object or alter the
argument list without any interruption in the service. The system must be able to read old data
from log files, as well as requests from out-of-date clients to new servers and vice versa.
2.5 Processors
Processors are the generated code capable of processing data streams to accomplish
remote procedure calls.
This chapter discusses briefly about the basic architecture of thrift, enlist the protocols,
transports and servers supported by Thrift and the benefits of Thrift.
3.1 Architecture
Thrift includes a complete stack for creating clients and servers [3]. The top portion of
the stack is generated code from the Thrift definition file. Client and processor code services
are generated from this file. The output of generated code is created data structures (except
built-in types). The protocol and transport are part of Thrift runtime library. Therefore with
Thrift, it is easy to define a service and are free to change the protocol and transport without
re-generating the code.
Thrift also includes server infrastructure to tie protocols and transport together, like
blocking, non-blocking and multi-threaded servers. The underlying I/O portion of the stack is
differently implemented for different languages. For Java and Python network I/O, the built-in
libraries are leveraged by the Thrift library, while the C++ implementation uses its own custom
implementation.
3.2 Supported protocols, transports and servers
Thrift allows free to choose independently between protocol, transport and server. With
Thrift being originally developed in C++, Thrift has the greatest variation among these in the
C++ implementation.
Thrift supports both binary and text protocols. The binary protocols outperform the text
protocols, but text protocols also useful in some times like in debugging. Some of Thrift
supported protocols are:
1. TBinaryProtocol - A straight forward binary format encoding numeric values to binary,
rather than to text. Simple, but not optimized for space efficiency. Faster to process than
the text protocol but more difficult to debug.
2. TCompactProtocol - More compact binary format and most efficient.
3. TDebugProtocol - A human readable text format and easy to debug.
4. TDenseProtocol - Similar to TCompactProtocol but strips off the meta information
from what is transmitted, and adds it back at the receiver. TDenseProtocol is still
experimental and not yet available in the Java implementation.
5. TJSONProtocol Uses JSON for encoding of data.
6. TSimpleJSONProtocol A write only protocol that cannot be parsed by Thrift because
it drops metadata using JSON. Suitable for parsing by scripting languages.
The above protocols describe what is transmitted, while Thrift transports describe how to
transmit. Some of Thrift supported transports are:
1. TfileTransport This transport writes to a file. This transport is not included with the
Java implementation, but simple to implement.
2. TFramedtransport - Sends data in frames, where each frame is preceded by a length.
This transport is required when using a non-blocking server.
Thrift has been employed in a large number of services at Facebook including search,
logging, mobile, ads and the developer platform [2]. This chapter discusses the major two
services of Facebook Thrift.
4.1. Search
Thrift is used as the underlying protocol and transport layer for the Facebook search
service. The multi-language code generation is easily suitable for search because it allows
application development in an efficient server side language (C++) and allows the Facebook
PHP based web application to make calls to the search service using Thrift PHP libraries. There
are large variety of search stats, deployment and testing functionalities also built on top of
generated PHP code. Additionally, the Thrift log le format is used as a redo log for providing
real-time search index updates. Thrift has allowed the search team to leverage each language
for its strengths and to develop code at a rapid pace.
4.2. Logging
The Thrift TFileTransport functionality is used for structured logging. Each service
function denition along with its parameters can be considered to be a structured log entry
identied by the function name. This log can be used for a variety of purposes, including inline
and ofine processing, stats aggregation and as a redo log.
This chapter discusses other service technologies in brief and shows their comparison
with Thrift.
REST (Representational State Transfer) is an architecture style for designing networked
applications. It depends on a stateless, client-server and cacheable communication protocolusually HTTP protocol is used. REST is a lightweight alternative to mechanisms like RPC
(Remote Procedure Calls) and Web Services. RESTful applications use HTTP requests to post,
read and delete data. Thus REST uses HTTP for all four CRUD (Create/Read/Update/Delete)
operations.
RMI (Remote Method Invocation) is a Java Application Programming Interface (API)
that performs the object oriented equivalent of remote procedure calls. Protocol buffers are
Google's language-neutral, platform-neutral, extensible mechanism for serializing structured
data like XML, but smaller, faster, and simpler.
The author Andrew Prunicki, compared Thrift with other technologies and noted down
the results to show the value proposition of Thrift over other Service technologies which are
also fairly easy to use in practice [3]. Of late RESTful web services seems to be very popular,
thus this chapter compares it with Thrift. Protocol buffers by Google does not include service
infrastructure, but it transports objects in a similar fashion to Thrifts TCompactProtocol, thus
making it a useful comparison. Lastly, RMI also includes, as it uses a binary transport and serve
as a reference implementation of sorts for Java binary object transport. In this chapter the file
sizes and run time performance of each service technology are compared. For REST, both
JSON-based and XML-based and for Thrift, the most efficient transport available for Java,
TCompactProtocol are considered.
Method
Thrift
Protocol Buffers
RMI
REST
Capture Technique
Custom client that forked the returning input stream to a file.
Stream to a file. Excludes messaging overhead.
Object serialization of the response. Excludes messaging overhead.
Use wget from the command line redirecting the response to a file.
The chart and table below show the results. None of the sizes include TCP/IP overhead.
Sizes are in bytes. Smaller the size is better.
Table 2 Size comparison of different techniques
Method
Thrift TCompactProtocol
Thrift TBinaryProtocol
Protocol Buffers
RMI (using Object Serialization for
estimate)
REST JSON
REST XML
Size
278
460
250
905
225.54
559
836
101.08
200.72
The comparison clearly shows that the Thrift has clear advantage in the size of its
payload particularly compared to RMI and REST. Protocol buffers from Google is a little better
than Thrift but it is not an open source.
10
This test scenario is executed 10,000 time. The tests were run on the following systems:
Table 3 Server System Specifications
Operating System
CPU
Memory
2GiB
Cores
2
Shutdown - To avoid any unnecessary spikes from other processes
during execution.
Sun Java SE Runtime Environment (build 1.6.0_14-b08)
Window System
Java Version
Operating System
CPU
Memory
Cores
Window System
Java Version
Method
Description
Thrift
Protocol Buffers
RMI
11
The chart and table below summarize the results. All times are in seconds.
CPU time is the total amount of time the CPU spent running the code or anything
requested by the code. This includes kernel time also. Whereas the wall time is the amount of
time required to complete the given task as counted by the system clock or stop watch.
12
Method
Server CPU %
REST XML
12.00
80.75
05:27.45
REST JSON
20.00
75.00
04:44.83
RMI
16.00
46.50
02:14.54
Protocol Buffers
Thrift
TBinaryProtocol
Thrift
TCompactProtocol
30.00
37.75
01:19.48
33.00
21.00
01:13.65
30.00
22.50
01:05.12
Some interesting observations can be derived from the comparisons. In terms of wall
time Thrift clearly out performed REST and RMI. In fact, TCompactProtocol took less than
20% of the time it took REST-XML to transmit the same data.
Overall the Thrift is a very good for service technology and it is an open source.
Protocol buffers is also very useful but it is under the copy right of Google. Thrift is a powerful
library for creating high-performance services that can be called from multiple languages.
13
Thrift is developed in 2006 at Facebook and then released as open source under Apache license
in 2007 [4]. Since then Thrift is used in Facebook and continually undergone so many
developments to increase efficiency, optimization, speed, memory utilization and reliability.
Today it powers more than 100 services used in production, which are written in C++, Java,
Python and PHP. The developers found two challenges after running Thrift for 8 years. They
are:
1. Thrift is missing core set of features
2. Performance
For example, one issue found was the internal service owners were constantly reinventing the same feature again and again, such as transport compression, authentication and
counters to track the health of the servers. To make asynchronous request handling work better,
Facebook engineers had to improve the memory handling capabilities of the generated C++
code. Engineers were spending large amount of time to improve the performance of their
services. Outside of Facebook, Thrift gained wide use as a serialization and RPC framework,
but ran in to similar performance concerns and issues separating the serialization and transport
logic.
Over time, developers found that parallel processing of requests from the same client
and out-of-order responses solved many of the performance issues. The benefits of the former
are obvious, the latter helps avoid application-level, head-of-line blocking. But still there is
need for more features.
6.1 Evolving the architecture
When Thrift was originally conceived, most services were relatively straightforward in
design. A web server would make a Thrift request to some backend service, and the service
would respond. But with the increase in the users of Facebook, the complexity of the service
also increased. Making a Thrift request was no longer so simple. The requirement of tiers of
services (services calling other services) and the need of unique feature demands for each
14
service, such as the various compression or trace/debug needs. Hence there is a need to upgrade
of specific use cases of Thrift.
To make asynchronous request handling work better, Facebook engineers had to
improve the memory handling capabilities of the generated C++ code. Thrifts original C++
generated code reuses the same memory space over and over for each request, which made it
impossible to process requests in out of order. So the developers rolled in a library from the
open source folly library called IOBuf that requests new buffers for each request, with some
optimization to reduce the performance hit that it creates.
In earlier versions of Thrift, the same memory buffer was reused for all requests, but
memory management quickly became tricky to use when it tried to update the buffer to send
responses out of order. Instead, now it requests new buffers from the memory allocator on every
request. To reduce the performance impact of allocating new buffers, it allocate constant-sized
buffers from JEMalloc to hit the thread-local buffer cache as often as possible. Hitting the
thread-local cache was an impressive performance improvement, for the average Thrift server,
it's just as fast as reusing or pooling buffers, without any of the complicated code. These buffers
are then chained together to become as large as needed, and freed when not needed, preventing
some memory issues seen in previous Thrift servers where memory was pooled indefinitely. In
order to support these chained buffers, all of the existing Thrift protocols had to be rewritten.
To allow for per-request attributes and features, a new THeader protocol and transport
were introduced. Thrift was previously limited in the fields that could be used to add perrequest information, and they were hard to access. As Thrift evolved, there is a need of a new
way to allow service owners to add new features without making changes to the core Thrift
libraries or breaking backward compatibility. For example, if a service wanted to start
15
compressing some responses or change timeouts, this should be easy to do without having to
completely change the transport used. The THeader format is very similar to HTTP headers,
each request passes along headers that the server can interpret. With some clever programming,
it was possible to make the THeader format backward compatible with all the previous Thrift
transports and protocols.
THeader format are being used by a number of Thrift services at Facebook. Service requests
that go between data centers are dynamically compressed based on the size of the message,
while in-rack requests skip compression (and thus avoid the CPU hit). A number of services
that have moved to the new cpp2 generated code have seen up to a 50% decrease in latency,
and/or large decreases in memory footprint. Additionally, the new C++ async code is a
dependency for newer HHVM releases.
With the use of cpp2 and Theader there is lot of improvement in the performance and
memory footprint. There is still need of performance and memory improvements to achieve the
goals. The new version, FBThrift, adds a number of new features aimed at handling larger,
more complex collections of services, a new C++ code generator, and components aimed at
creating services that are less memory-intensive and demand less of hardware when under
heavy load. Hence there is a great need to develop the technology to the greatest extent.
17
Chapter 7 Conclusion
Thrift is a powerful library for creating high-performance services that can be called
from multiple languages. Thrift will be a good choice for an application where there is need for
multiple languages to communicate where speed is a concern and the clients and servers are
co-located. Thrift might also make a good choice for IPC on a single machine where speed
and/or interoperability are a concern.
Thrift is already used in wide variety of applications at Facebook. So many developers
are also contributing at Apache to make the Thrift a scalable, efficient and system reliable
technology. Thrift is a forthcoming technology for software paradigm.
18
References
1. Thrift White Paper, http://thrift.apache.org/static/thrift- 20070401.pdf.
2. Mark Slee, Aditya Agarwal and Marc Kwiatkowski, Thrift: Scalable Cross-Language
Services Implementation.
3. Andrew Prunicki, Senior Software Engineer, Apache Thrift, Object Computing, Inc.
(OCI).
4. Dave Watson, Under the Hood: Building and open-sourcing FBThrift.
5. Thor Olavsrud, Facebook Open Sources Thrift Protocol.
6. Sean Gallagher, Facebook open-sources Thrift, again, with FBThrift overhaul.
7. Shane Schick, Facebook shows off Thrift development environment.
8. Michael Cvet, Facebook Thrift Tutorial.
19
Acknowledgement
I take this opportunity to appreciate the Facebook Thrift developers at Facebook and Apache
for their restless work to develop such a versatile technology for cross-language services. I also
convey my sincere gratitude to all those intellectuals concerned for their magnanimous vision
by virtue of which I have been guided through to accomplish my mission. I also express my
heartiest gratitude to Mrs. Dipti P. Rana (Asst. Prof.) and other faculties of Computer
Engineering Department, SVNIT for their valuable guidance, moral support and believing in
me.
20