Monthly Archives: February 2011

eBay Open Sourced its SOA platform!

Last month, eBay released its open source site,, with a mission to open source some of the best of breed technologies developed within eBay Inc. to the community. The first open sourced project, code named Turmeric, is a comprehensive SOA platform for scalable development, deployment, management and monitoring of services.
Turmeric is being released under the Apache License (a copy is available here).
As with any first public release of an existing internal code base you may encounter some rough edges. If that happens, please help us make the project better by filing an issue or posting to a discussion forum!

What is Turmeric?

Turmeric is a comprehensive, policy-driven SOA platform that you can use to develop, deploy, secure, run and monitor SOA services and consumers. It is a Java based platform, follows the standards (SOAP, XML, JSON, XACML, etc.), and supports WSDL (SOAP style – Doc Lit wrapped mode) and REST styles. It supports a variety of protocols and data formats. Eclipse plugins help with the development of services and consumers. Other important features include:

  • Quality of Service (QoS) features such as authentication, authorization, and rate limiting, which are controlled by defining respective policies
  • Monitoring capabilities
  • Type Library feature provides the ability to define and manage reusable schema type definitions across services, and hierarchically organizes them
  • Error Library capability that lets you define and re-use error definitions across services
  • Local binding lets you locally bind services to consumers as a deployment time option, for optimization, without loss of any generality or having to change code
  • Policy Administration console
  • Repository Service enables service registration and governance
  • Assertions Service that enables automating WSDL validations
  • Built-in REST mapping capabilities

The Turmeric platform is highly extensible and customizable. You can easily plug in additional protocol processors, data formats, handlers and various other capabilities. The platform is optimized for large scale environments.

What does the Turmeric platform include?

The Turmeric platform includes:

  • Core runtime — The core infrastructure library, based on a pipeline architecture, used to run services and consumers. You can customize, or create a plugin for, almost every aspect of the infrastructure. It is divided into three parts.
    • The binding framework library is a flexible and customizable binding implementation. It provides XML, Binary XML, JSON, and Name-value bindings out of the box
    • A server-side framework library, called Service Provider Framework (SPF)
    • A client-side framework library, called Service Invocation Framework (SIF)
  • Developer tools : Simple to use Eclipse plugins that help create services and consumers, manage reusable types across services via Type Libraries, and manage error definitions that are reused across services via Error Libraries
  • Monitoring — A comprehensive monitoring platform with multiple components:
    • Runtime aggregation of various metrics on both the client and server side
    • Configurable metrics
    • Storage providers that push the aggregated data from each app server
    • A monitoring service that provides aggregated metrics data across all nodes
    • A monitoring console that lets you view metrics
  • Security Services — Everything in the platform is policy driven, and the policies follow the XACML structure and syntax. Various services, including authentication, authorization, rate limiting and group membership services, interact with the policy service and act as policy enforcement points
  • Policy Admin Console — Manages policy definitions
  • Repository Service — Service registration, dependency management and governance is a key aspect of the end-to-end platform. Repository Service is an abstraction of this functionality, which makes it agnostic to specific repository products. The actual supported capability depends on the capabilities of the underlying repository product
  • WSDL Assertion Service — As part of implementing a governance process and achieving consistency in an enterprise, you can define guidelines for the WSDL interface of a service. Using the assertions service, you can express these guidelines as XQuery assertions and validate them against WSDLs. This Assertion Service capability is also integrated into the eclipse plugin

Many of the services mentioned above are architected to have pluggable provider implementations, for example a username/password based authentication or a token based authentication for the authentication service. eBay specific providers are replaced with sample open source providers to demonstrate the capability and Turmeric users can develop their own providers and plug them in as appropriate.

Who is the target audience?

This platform is for anyone looking to adopt a SOA based, distributed computing strategy, and wants a simple way to develop, deploy and monitor services and applications on a scalable, tested platform that has all the end-to-end comprehensive capabilities described above.

As with all open source offerings, you are also encourage to contribute to the platform. I hope you choose to get involved and look forward to your questions and feedback!

Service Oriented or Performance Oriented

We have seen a big push toward a Service Oriented Architecture (SOA) across the IT industry over the last decade. Many companies, including eBay, have made big investments in developing SOA frameworks, evolving their applications, introducing governance processes and breaking up monolithic systems to assume a more service oriented nature. The benefits are many and well understood, but adopting a SOA also has its potential pitfalls.

For web-facing applications the decoupling and isolation along functional boundaries may lead to an increase in latency, a decrease in throughput, and consequently a negative impact on site-speed (check out Hugh’s blog entries on how we define, measure and improve site-speed at eBay). In other words, a service-oriented architecture may not be the same as a performance oriented architecture (POA).

If your application must fulfill stringent site speed requirements, then a loosely coupled SOA, while beneficial in terms of system maintenance, management and evolution, may turn out to be unsatisfactory nonetheless. One might argue that performance is lacking because the system wasn’t decomposed in the best possible way, i.e. that the lines were drawn inappropriately; but that just reinforces the point. The lines you draw between a more service-oriented versus a more performance oriented architecture can impact your bottom line.

How can service orientation limit site-speed? In my experience it comes down to the data path. As in many computer architectures at the micro level, data flow optimizations are becoming increasingly important at the macro level when we optimize for site-speed across complex distributed applications. The latencies encountered down the memory and storage hierarchies can quickly add up and consequently many threads end up running idle waiting for data. Adding disaster recovery (DR) capabilities and ensuring loads are balanced across data centers only exacerbate the problem.

What can we do? Again, a look at the evolution of computer architecture can provide some guidance. Caching, pipelining, prediction, pre-fetching and out-of-order execution are some of the key milestone achievements; and these techniques can be applied at the macro level as well. When you decompose a system into services, rather than focusing on a pure functional partitioning, you should take into consideration the data flow and examine any data dependencies across functions. At the same time, think of caching, pipelining, prediction, prefetching, memoization, asynchronous I/O, event driven processing and fork/join frameworks as tools in your optimization toolbox. You will find that services become more coarse-grained due to optimizations along the data path. In other words, two functions that logically represent separate services may end up being coupled due to data dependencies such that you end up not just colocating them, but encapsulating them in a single service. You may also find that the invocation of one service can, as a side effect, produce valuable results more efficiently than if you had to invoke a second service to produce the same.

Another way to look at it is to approach the big picture with a service oriented mindset while tuning critical subsystems with a performance oriented mindset. We don’t suggest you replace a SOA with a POA across the board, but rather pay attention to islands of functionality that might benefit from a POA in an archipelago where SOA dominates.

Let me illustrate with a concrete example. We have spent the last two years optimizing a content delivery platform for real-time user messaging. It is used for advertising, recommendations and loyalty programs across the eBay site. In fact, much of the content you see on the new home page is delivered by this platform. The system is one of the largest at eBay and handles over 2.5 billion calls on a busy day. It relies on numerous data sources and provides two primary capabilities: user segmentation and dynamic, personalized content generation.
User Messaging
The data consumed by the platform can be categorized into the following:

  • Contextual: request-specific (placements, geo, user agent)
  • Demographic: user-specific (age, gender)
  • Behavioral: user-specific (items purchased)
  • Configuration: campaigns and segments
  • Services: on-demand, usually content-specific (ads, recommendations, coupons)

Contextual information is passed to the system as part of the incoming request (typically from a front-end tier) and describes the context for a set of placements. A placement is a reserved area on a web page where a creative is displayed. The demographic and behavioral data is user-specific and includes buyer and seller properties. Behavioral attributes also include real-time data based on recent site activity, e.g. a user’s recently viewed items. Campaigns are deployed to configure the system. They contain segment definitions and may be prioritized or scheduled to run for a specific period of time. All configuration data is cached in memory and refreshed periodically.

The platform also relies on a set of on-demand services that act as data providers. The campaign configuration dictates which service is required for a given type of message. While colocation of some of these would further improve system performance, most services currently run as separate systems.

In selecting a message for a placement on a page, the request handler may evaluate numerous segment rules to determine which message is the most appropriate for the current user in the given context. This evaluation depends on data lookups that may incur a significant cost in terms of latency. After a message has been selected, a content generator creates the actual message creative which may require similar data lookups. For example, a JSP template may rely on a recommender system to deliver item data related to a user’s recent purchase.

The platform typically receives a batch request for as many as 15 placements, so the request handler runs a number of concurrent tasks and data is shared across tasks as much as possible. Content generator tasks are prioritized so the ones that depend on remote data providers are dispatched as early as possible (out-of-order execution). Contextual data is used to predict when to prefetch correlated user data based on common usage patterns. Certain asynchronous tasks may continue to run even after the response has been delivered. For example, a task that writes back campaign usage metrics to a persistent store doesn’t unnecessarily hold up the main request handler.

Numerous platform optimizations are aimed at reducing latency by preventing multiple lookups for similar data in the same request. The data path from segmentation to content generation is highly optimized and even customized for certain use-cases. For example, if the rule engine determines the user falls into a segment for which a coupon is to be issued, it fetches any coupon data used for content generation as a side-effect of segment evaluation. And the task that generates the coupon’s HTML creative reads the data from memory.

While from a SOA perspective we may be encouraged to decouple segmentation from content generation so as to make each component potentially reusable in other applications, the combination based on a POA where the data path is optimized clearly delivers better results in terms of site-speed. The improvements in real-time content delivery are reflected in the reduced response times measured for the eBay home page. The two measurements below are from late 2009 (dotted curve) and late 2010 (black curve). The two curves represent the cumulative distribution of the duration of calls handled by the platform based on a large sample at the same time of day on the same day of the week (loads vary substantially depending on the time of day, and some days are busier than others).
Response Times
The chart above indicates that at the 90th percentile response times have improved by 30% over the course of the year. The speed-up is all the more significant given that the response times for critical data providers hasn’t changed much and the system remains I/O bound. You may be wondering why response times at the 20th percentile have doubled. This is due to the campaign configuration. The home page configuration in late 2009 wasn’t as dependent on data services as the one in late 2010 was, and thus some of the calls weren’t as I/O bound. The samples for the black line above were collected during a period when almost all requests from the home page contained at least one placement that depended on a remote data source.

I’ll leave you with two things. First, a SOA is good, but in certain cases a POA is better. And second, if time is of the essence, an awareness of the data flow and data dependencies in your system can help you decide where to begin adopting more of the latter.

CloudCamp Silicon Valley at eBay

CloudCamp is an unconference (a participant-driven meeting) where early adopters of Cloud Computing technologies exchange ideas. The upcoming CloudCamp Silicon Valley on February 10 from 6pm-10pm will be hosted by eBay on our San Jose North Campus (2161 N First St, San Jose, CA 95131).

This public event brings together a wide range of cloud computing enthusiasts, providing a great opportunity to network with peers in the industry and exchange ideas about addressing cloud computing adoption challenges. The focus will be Platform-as-a-Service from the perspective of solution architecture, design, implementation, and management.

For more information and to register, please visit Hope to see you next week!