Muhammad Haseeb Tariq

Muhammad Haseeb Tariq

Data Scientist
Utrecht, The Netherlands

Challenges

At Coolblue, several departments are dealing with many different kinds of timeseries data:

  • Number of shipments per day
  • Number of invoices per day
  • Number of customer service calls per day
  • Number of returns per day
  • Number of sales per item per day
  • … and more
For different use cases, the decision makers want the forecasts with different granularities and different time span:
  • Weekly number of shipments for the next year
  • Daily shipments for the next 2 weeks
  • Number of sales per item for the next day
  • … and more
Modeling even a single timeseries "manually" could be a very time consuming task. With every passing day the trends and patterns in each timeseries are also changing. To top it all off, every timeseries is also different in terms of:
  • Volatility
  • Stationarity
  • Magnitude, and
  • Noise
It is, therefore, infeasible to hire several hundred analysts to perform (readily available) forecasts every day. Automating the:
  • Training of the models
  • Validation of the models, in terms of:
    • Accuracy
    • Stability
    • Reliability
    • Robustness
  • Serving the forecasts
  • Controlling, and
  • Monitoring the outputs
…therefore, become a huge challenge. Not only in terms of design complexity, but time and space complexities as well.

Solution

We came up with idea of GFM, keeping these requirements in mind. A fully-automated "generalized" forecasting machine, that is able to perform robust scientific/statistical analyses and tasks at scale.

The challenge with the time complexity was solved by utilizing distributed computing on Spark. The design complexity and the topic of validating models at scale require a lot of explanation. Happy to share those details in a conversation 😊