This article will present a Transformer-decoder architecture for forecasting on a humidity time-series data-set provided by Woodsense . This project is a follow-up on a previous project that involved training an LSTM on the same data-set. The LSTM was seen to suffer from “short-term memory” over long sequences. Consequently, a Transformer will be used in this project, which outperforms the LSTM on the same data-set.

Inspired by the graphic in D2L¹

Why use a Transformer ?

LSTMs process tokens sequentially, as shown above. This architecture maintains a hidden state that is updated with every new input token, representing the entire sequence it has seen. Theoretically, very important information can…

Natasha Klingenbrunn

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store