Haseeb Tariq

Back

Challenges

At Coolblue, several departments are dealing with many different kinds of timeseries data:

Number of shipments per day
Number of invoices per day
Number of customer service calls per day
Number of returns per day
Number of sales per item per day
… and more

For different use cases, the decision makers want the forecasts with different granularities and different time span:

Weekly number of shipments for the next year
Daily shipments for the next 2 weeks
Number of sales per item for the next day
… and more

Modeling even a single timeseries "manually" could be a very time consuming task. With every passing day the trends and patterns in each timeseries are also changing. To top it all off, every timeseries is also different in terms of:

Volatility
Stationarity
Magnitude, and
Noise

It is, therefore, infeasible to hire several hundred analysts to perform (readily available) forecasts every day. Automating the:

Training of the models
Validation of the models, in terms of:
- Accuracy
- Stability
- Reliability
- Robustness
Serving the forecasts
Controlling, and
Monitoring the outputs

…therefore, become a huge challenge. Not only in terms of design complexity, but time and space complexities as well.

Solution

We came up with idea of GFM, keeping these requirements in mind. A fully-automated "generalized" forecasting machine, that is able to perform robust scientific/statistical analyses and tasks at scale.

The challenge with the time complexity was solved by utilizing distributed computing on Spark. The design complexity and the topic of validating models at scale require a lot of explanation. Happy to share those details in a conversation 😊

Haseeb Tariq

Data Scientist

Back

Challenges

Solution