ColorizerJS — My First Hack Project at eBay


First blog on the eBay Tech Blog, whooooh!

Hello! This blog will mostly be about my project, ColorizerJS, but since this is my first post on the eBay Tech Blog, I thought I’d introduce myself. For those unaware of some of the terminology and nomenclature, I will try to provide some brief short-bracket explanations[“[]”]. If you are submerged in web software or already cool enough to know the terminology, just pretend like there are no brackets.

I started working at eBay in May after graduating from the University of California, Santa Cruz, with a bachelor’s degree in Computer Science Game Design (now called CGPM). CSGD at UCSC was a great program for front-facing developers, as games are as much about the user (or player) as they are the algorithm. Plus, at least at UCSC, I got a whole lot more experience with managing assets and interfacing between technologies in my Game Design classes than in my vanilla CS classes.

The interview process at eBay was great, not too niche-specific and not too broad, with great questions on both niche and broad ends of the interview spectrum. I was hired as a full-time IC [individual contributor] on the Search Front-End [abbreviated to FE, meaning closest to the client] engineering team. This means that whenever you enter a search query, the stuff you get back — JavaScript [functionality, the bulk of it], CSS [look and feel], HTML [structure] — is what my team handles.

I was lucky enough to be on a great team at a great time. When Hack Week 2016 rolled in mid-June, I was too new to have many responsibilities keeping me from joining the Hack Week, yet experienced enough to know our codebase a bit and have some ideas for my potential project.

After some thought, I determined something that would be fun to work on, enhance the user experience, and bring in revenue to eBay through increased buyer activity. The initial problem arose from eBay sellers often having non-standard sized images, either shorter or thinner than a square. With the current redesign of Search Results Page (SRP), the method to combat this problem was to overlay all images onto a grey bounding square. This technically fixed the problem of non-standard image dimensions, but it did not lend itself to good aesthetics. This is where ColorizerJS comes in.

Current background implementation in production — not so hot

I was inspired to do ColorizerJS by my coworker (and one of my many mentors) Yoni Medoff, who made a script that drew loaded images to an HTML Canvas, averaged the RGB [red, green, and blue channels. You definitely know this. These brackets are so condescending.] values of each image in the search results for a query, then created an RGB color out of those average channel numbers, and set the containing square (normally just white or grey) behind the image to be the average color. This was intended to provide a more visually appealing way of standardizing the size and shape of each image in search results than just slapping them on top of a grey square. Yoni’s implementation looked significantly better than just a grey square, but it could definitely be improved upon.

There were four problems with his approach:

  • It did not look good in many cases. Sometimes the generated average does match any color in an image. A strong foreground color can throw that average out of whack, and most of all, the average may not resemble the “background” at all.
  • It depended on HTML <img/> elements [images on the (meta)physical webpage], which limited use to the DOM, so that isomorphic [using the same code in a browser and on the server] server-side use was impossible and parallelized web-workers [multiple CPU threads running JavaScript] wouldn’t work.
  • It converted the <img/> to an HTML Canvas element to extract the pixel data, a slow process that required drawing to large Canvas objects before being of any use. This was not time- or memory-efficient.
  • It was very tightly coupled to our codebase, meaning that it could not be used in other parts of the site or on other websites without huge changes.

My code running average algorithm — it’s alright

I thought, “Oh snap, here’s my chance to win $10k and become VP of engineering or whatever the Hack Week prize is.” Easy! All I needed to do was come up with a way to load the image into the browser cache to reduce double-requests [when a browser makes two requests to the server for the same image, BAD] and then read that cached image as binary [1000100101], analyze the binary file to extract its RGB pixel data, organize that in a logical way to analyze, and then analyze it and determine the most likely background color of the image to be appended to the background bounding square of the image, all while keeping the analysis code modular so as to be used on the client, web-worker, or server, and independent of our codebase and modular so that I could open-source it as an NPM module to be used by anyone on any page. Mayyybe not so easy.

So now I have a considerable job — meet all these criteria in a week — but where do I start? I decided starting with a client-side [user’s browser] image decoding [decoders analyze an image and give back a string of numbers indicating the channel values of each pixel] NPM module. eBay supports JPEG and WebP (Google’s new super-great image compression format) formats, and I figured since JPEG was older, I’d have more luck with it, so off I went looking for a client-side decoder. There weren’t any client-side JPEG decoders. Only Node [server-side]. Nice. After a few PRs [Pull Request] to jpeg-js (to support native UInt8Arrays instead of just Node-specific Buffer arrays) and just like that I had a client-side JPEG decoder. Nice.

Next I had to figure out how to request the image file as binary, and found a great article on JQuery Ajax custom transport overriding. This allowed me to send the data to jpeg-js as a bitstream in a JavaScript typed-array (UInt8Array in this case). jpeg-js sadly only supported 4-channel (including alpha-transparency) output, so in my color analysis code I handled both 3-channel and 4-channel output flags as 4-channel. This increased data overhead by about 1/3 more bytes per image [since each channel of a pixel is one byte] — inconvenient but not a dealbreaker.

With my now merged pull request to support client side analysis and array type, jpeg-js analyzed my binary input (after some array conversion) and gave me back an object with a height, width, and array of bytes [number from 0 – 255], each four array indices corresponding to a pixel in the image binary. I found a great WebP analysis library called libwebp (I got the code by viewing the page source) and got it working.

Now it’s time to do some background analysis!

I started with the simple average-of-the-edge-pixels algorithm and appended the resulting color to the bounding box behind each image result, which expectedly yielded sub-par results, but at least the analysis was working. I then I decided to up the size of the pixels analyzed to around 10 pixels per side. If the image was tall I would check the outer 8 pixels on the left and right side, if it was short I would check the top and bottom. I made a function that determined two modes for each color channel, determined the weight of each mode against each other and against the average, shaving off outliers and conversely, fully using the mode if it was significant enough in the image. This yielded great results, especially for images with fairly flat background colors or heavily dominating colors.

Lookin’ pretty good

But some images, especially those with sporadic colors, colors that are very different from side to side, edges with very different colors than the rest of the image, borders, and other complex images, sometimes did not cooperate with this implementation. Some generated colors would clearly not occur at all in the picture or just not fit as a background.

WHERE IS THAT MINT COMING FROM? — Mr Nice Guy could have fixed this if he were still here.

I did not want this algorithm to be just KINDA similar to the background for an image, I wanted it to be almost EXACT. I came up with a few modifications, more modes, more pixels analyzed, some analysis of sectors and their potential significance to determine their weight, and doing analysis of 2 pixels as well as large sections of 20–30 pixels from each edge and determining their weights. Also I fine-tuned the cutoffs for modes and averages to be more likely to exclude an average and include a mode. Some modifications to weights for each sector was required before I came up with a fairly finished product.

Niiiiiiice. It actually looks like a background.

Particularly impressed with the fighter jet analysis, good job Colorizer

I presented the Colorizer at the end of the week of Hack Week 2016 and got some positive responses from the judges. Fun project, and some hard work for a little more than a week.

I hope to make all magic numbers into parameters that can be defined by the user to widen the range of use cases that ColorizerJS can apply to. I also want to make some more functions in the Colorizer that can make it work with path image files to be used on the server, satisfying my original requirement of the module being isomorphic. If it ends up looking good, I might also come up with separate background colors for each side (top and bottom or left and right) if they are different enough. This would especially work well with unprofessionally shot images in sellers’ homes that might have vastly different upper and lower backgrounds in their images.

Soon I will get the code up and tested on Search Results Page, working with Web workers, and published as an NPM module with some examples of usage so that you and all of your friends can do some sweet image analysis. All that is needed is an image URL passed to ColorizerJS, the image is cached and analyzed, and out comes the background color!

Thanks for reading. I will update soon with more info as the project progresses. When I open-source ColorizerJS, I will post a link to the Git repo.

Big thanks to Yoni Medoff for the idea and initial implementation along with encouragement along the way, Eugene Ware for jpeg-js and the help getting it implemented, and Senthil Padmanabhan for the inspiration to write this blog.

2 thoughts on “ColorizerJS — My First Hack Project at eBay

  1. Tim Reilly

    Awesome work!

    A real-world business problem we are trying to solve is to identify the main color of thousands of individual items in order to properly assign the Color aspect when listing a product for sale. This gets harder as you have things like apparel shot on a model, so you potentially are listing a shirt but also have the colors from the pants, the model’s skin or hair, etc. An algo that could determine the item being sold in listing #262649521024 is Blue would be quite useful. Seems like your approach could be tailored to that somehow.

    1. Weston Mossman Post author

      If the actual item itself is small enough it would be hit-miss whether or not an algorithm could determine what in the picture is actually being sold from a picture alone.

      A potential solution to the problem: When a seller is listing an item, prompt n additional step to draw a square over the object being sold in the main image, this way the algorithm used could focus on only those pixels and not the rest of the image.

      In addition, there are some automatic eBay speed listing projects underway that analyze visual objects and generate their own listings from those images. With these use cases, the generator could save only the pixels that it considers to be part of the object being sold and hand that off to an algorithm like mine to determine the color.


Leave a Reply

Your email address will not be published. Required fields are marked *