Shopping online is devoid of instant gratification. Customers don't walk away, package in hand, with a new [insert product type here]; instead, they select a shipping method and wait. But retailers are trying mightily to change that, starting with Amazon, which was awarded a patent in December on anticipatory shipping -- a system that aims to ship the goods before customers even make the purchase. Pundits have questioned whether Amazon's patent for shipping goods to intermediate hubs in anticipation of sales is anything new(or even tenable for all but the most generic purchases). But that hasn't stopped retailers from deploying sophisticated analytics to fulfill, so to speak, the pleasure of getting an online order sooner rather than later.
Just ask Igor Elbert, data scientist at high-end fashion and home wares retailer Gilt Groupe in New York. With machine learning and predictive models, Elbert is building a practice of pre-emptive shipping at Gilt. Using a slew of factors (some subjective, others objective), the program aims to determine which products will sell better where -- East Coast, West Coast or Middle America -- before customers click "Buy." The data challenges of doing this are enormous, raising such existential questions as, "What is a color?"
Elbert recently sat down with SearchCIO.com to talk about pre-emptive shipping, a topic he'll be presenting on at Strata + Hadoop World in New York next month.
What is pre-emptive shipping?
Igor Elbert: In anticipation that some items will be sold in certain areas, we can start moving them even before we put them on sale. So, by the time they're actually purchased, they're geographically closer to the customer, meaning the customer would get them faster.
How is the problem Gilt is trying to solve different from what Amazon calls 'anticipatory shipping?'
Elbert: We kind of envy Amazon, because their problem is much easier. If you predicted for toothpaste in Orange County, California, you have a reliable past history of toothpaste sales. If it's more or less stable, you can bet people next month will need the same amount of toothpaste they bought from you last month. Retailers have been doing this since forever -- looking at sales forecasts and moving items closer to the customer. But Amazon took it one step further. They said, 'Knowing your previous purchase history, we'll send the product to your doorstep.' The model relies on, from what I understand, knowing what you bought before. So, if you bought toothpaste last month, and you've been buying toothpaste every two months for the last two years, they know you'll need toothpaste next month and they can ship it to you. It's a low risk to them because if you don't need it, you can return it, but it's likely you'll actually need it.
We don't go that far. We're not going to ship a high-end dress to someone [before she's bought it]. But we try to move products in the direction of the intended buyer early on.
What's your biggest data challenge?
Elbert: Assembling the data set: to understand which attributes are predictive, to get reliable data, to clean it. Something like, for example, [the product's] material is highly predictive, but material is, in many cases, supplied by the vendor and hand-typed by someone. So, there are typos. Normalizing materials into something the algorithm can use is a challenge in understanding what's predictive and what's not.
Initially, I ended up with several thousand descriptions of materials of what the items are made from. Several thousand are too much, plus many of them are variations of the same thing. I went through several iterations of cleaning, aggregating and normalizing just to help algorithms deal with this. That's a data challenge.
Understanding what makes an item is also a challenge. Like color: That seems obvious, but it turns out not to be. Color, it depends on which definition -- what vendors call it, what Gilt calls it and what the National Retail Federation calls it. An item could have three colors assigned to it, and they could be very different. So, understanding specifics of high fashion attributes and so on.
And then realizations that it's not just an individual item. The day makes a difference, and the concept of good day/bad day is a predictor for individual item performance. That came after several iterations.
What is your biggest technology challenge?
Elbert: Moving the data from the source -- from the operational database -- to the data warehouse to the source prediction to the prediction algorithms to get the results fast enough to have enough lead time to basically send products pre-emptively. At Gilt or at any flash-sale business, everything happens quickly. Items are received, photographed, described and put on sale at a very rapid pace. This is different from Amazon or any [traditional] retailer. For me, I have [only] days from us first hearing about the products to the products being put on sale.
And the reliability of the data varies. The closer we get to the sale date, the more solid the data is. One day out or two days out, it could still be fuzzy -- meaning the product is not described, the price is not set up. I need to take this fogginess into account as well.
What is the current goal behind your pre-emptive shipping program?
Elbert: The current objective is to shorten shipping time to customers. As we scale it, we hope to save on costs.
What do you think pre-emptive shipping will look like eventually?
Elbert: Ideally, it would be nice if we could start packing and preparing shipping while the product is in transit. Right now, we just ship it several days in advance, so it arrives to the hub by the time sales start. It would be interesting to shorten this window. Sales would start while items are in transit, and someone inside the truck would start packing it into individual boxes, applying labels and so on. Essentially, the truck would become the hub on a subset of projects that would be highly targeted. That's a vision for the future.
How Gilt leverages Amazon Mechanical Turk to harness subjective attributes
Data scientists need more than just tech skills