Unveiling the Power of Self-Attention for Shipping Cost Prediction: Conclusion and Future Work

14 Jun 2024


(1) P Aditya Sreekar, Amazon and these authors contributed equally to this work {sreekarp@amazon.com};

(2) Sahil Verm, Amazon and these authors contributed equally to this work {vrsahil@amazon.com;}

(3) Varun Madhavan, Indian Institute of Technology, Kharagpur. Work done during internship at Amazon {varunmadhavan@iitkgp.ac.in};

(4) Abhishek Persad, Amazon {persadap@amazon.com}.

5. Conclusion and Future Work

In this paper, we presented a novel framework based on the Transformer architecture for predicting shipping costs on day 0. Our proposed framework encodes shipping attributes of a package, i.e., the package rate card, into a uniform embedding space. These embeddings are then fed through a Transformer layer, which models complex higher-order interactions and learns an effective representation of the package rate card for predicting shipping costs. Our experimental results demonstrate that the proposed model, called RCT, outperforms GBDT model by 28.8%. Furthermore, demonstrate the RCT performs better than SOTA model FT-Transformer for our problem statement. We also show that when rate card representation learned by RCT is added to GBDT model, its performance is improved by 12.51%. This underscores the fact that RCT is able to learn sufficient representation representations of rate card information.

n this work, the route information used was limited to the start and end nodes alone. Future work could explore the use of Graph Neural Networks to encode information about the complete route. Further, the performance of the RCT might be improved by exploring ways to include the item ID as a feature, such as the use of item embeddings which are available internally.

Also, while the RCT was trained to predict only the ship cost, it can be modified to predict all the attributes of the invoice by adding a Transformer decoder layer. This would enable other applications like invoice anomaly detection. Additionally, future research could investigate whether the package representations learnt by the RCT can be used to improve the performance of other related tasks or to quantify the model uncertainty in each prediction via approaches like the one proposed in Amini et al. (2019).

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.