Bitcoin

How an Open Model and a Pile of Data are Changing Time Series Analysis

Abstract and 1. Introduction

  1. Related Work

  2. Methodology

  3. Experimental Setup and Results

  4. Conclusion and Future Work

Acknowledgments

Reproducibility statement

Impact statement, and References

5. Conclusion and Future Work

We release the first open-source family of time series foundation models and make contributions at all stages of the development and evaluation process. We first compile a large and diverse collection of public time series, called the Time Series Pile, and demonstrate its efficacy by pre-training high-performing time series foundation models from scratch. Then, we systematically address several time series-specific challenges, which up to now have impeded widespread exploration of extensivelarge-scale multi-dataset pre-training.

Figure 5. PCA and t-SNE visualizations of representations learned by MOMENT on the 3 largest UCR datasets. Different colors represent different classes. Even without dataset-specific fine-tuning, MOMENT learns distinct representations for different classes.Figure 5. PCA and t-SNE visualizations of representations learned by MOMENT on the 3 largest UCR datasets. Different colors represent different classes. Even without dataset-specific fine-tuning, MOMENT learns distinct representations for different classes.

Figure 6. Training losses (MSE). A dashed vertical line denotes the first epoch. All models were trained with a batch size of 131072 patches. (left) Larger models obtain lower training loss. right Eventually, randomly initialized MOMENT-small outperforms the same model initialized with Flan-T5 weights.Figure 6. Training losses (MSE). A dashed vertical line denotes the first epoch. All models were trained with a batch size of 131072 patches. (left) Larger models obtain lower training loss. right Eventually, randomly initialized MOMENT-small outperforms the same model initialized with Flan-T5 weights.

We use the Time Series Pile and these strategies to pre-train transformer models of three different sizes. Finally, we design an experimental benchmark to evaluate time series foundation models on multiple practical time series tasks,

Table 6. Imputation Results. MOMENT with linear probing achieved the lowest reconstruction error on all ETT datasets. In the zero-shot setting, MOMENT consistently outperformed all statistical interpolation methods with the exception of linear interpolation. Complete results in Tab. 29.Table 6. Imputation Results. MOMENT with linear probing achieved the lowest reconstruction error on all ETT datasets. In the zero-shot setting, MOMENT consistently outperformed all statistical interpolation methods with the exception of linear interpolation. Complete results in Tab. 29.

particularly focusing on scenarios with constrained compute and supervision, building on prior work by Wu et al. (2023). Using this benchmark, we show that MOMENT is effective for the considered tasks with minimal fine-tuning. MOMENT’s superior performance, especially on anomaly detection and classification problems which typically have small datasets, can be attributed to pre-training. Moreover, we demonstrate that across many tasks, smaller statistical and shallower deep learning methods perform reasonably well. Lastly, we make several interesting empirical observations about time series foundation models. Our overarching goal is to push the boundaries of open science by publicly releasing the Time Series Pile, along with code, model weights, and training logs.

We note several interesting directions of future work, including the application of MOMENT to real-world challenges, investigating multi-modal time series and text foundation models (Cai et al., 2023), and enhancing forecasting performance by pre-training MOMENT using causal attention and forecasting objectives.

Acknowledgments

Funding. This work was partially supported by the National Institutes of Health (NIH) under awards R01HL144692 and 1R01NS124642-01, and also by the U.S. Army Research Office and the U.S. Army Futures Command under Contract No. W911NF-20-D-0002. The content of the information does not necessarily reflect the position or the policy of the government and no official endorsement should be inferred.

Discussions. We would like to express our sincerest gratitude to Barıs¸ Kurt, Andrey Kan, Laurent Callot, Gauthier Guinet, Jingchao Ni, and Jonas M. Kubler for insightful ¨ discussions regarding the problem setting and experimental design. Their unwavering support was instrumental in the development of MOMENT. We are also thankful to Laurent, Barıs¸, Jingchao and Andrey for their constructive feedback on the writing of this manuscript. Additionally, we acknowledge the insightful exchanges with Yuyang (Bernie) Wang, Abdul Fatir Ansari, Ingo Guering, Xiyuan Zhang, and Anoop Deoras. Special thanks to Cherie Ho for suggesting a creative and befitting name for our model. Lastly, we would like to thank Cecilia Morales for her insightful comments, especially on the broader impacts of this work, and for helping us proofread this manuscript.

Data. We extend our gratitude to the authors and data curators whose meticulous efforts were instrumental in curating the datasets utilized for both pre-training and evaluation purposes: UCR Time Series Classification Archive (Dau et al., 2018), TSB-UAD Anomaly Benchmark (Paparrizos et al., 2022b), Monash Forecasting Archive (Godahewa et al., 2021), and the long-horizon forecasting datasets (Zhou et al., 2021).

Software and Models. Our training and evaluation library was inspired from Time-Series-Library. We would also like to thank the authors of the following libraries for their implementations: universal-computation, Anomaly-Transformer, VUS, tsad-model-selection, One-Fits-All and Statsforecast (Garza et al., 2022).

Reproducibility statement

All models were trained and evaluated on a computing cluster consisting of 128 AMD EPYC 7502 CPUs, 503 GB of RAM, and 8 NVIDIA RTX A6000 GPUs each with 49 GiB RAM. All MOMENT variants were trained on a single A6000 GPU (with any data or model parallelism). We have made MOMENT-large[6] and the Time Series Pile[7] publicly available on Huggingface. We are working on opensourcing MOMENT-base and MOMENT-small, and our research code public. The latter is currently available anonymously at https://anonymous.4open.science/ r/BETT-773F/README.md. We enlist an exhaustive list of hyper-parameters in App. E to aid reproducibility. We would like to emphasize that all datasets used in this study are publicly available.

Impact statement

Transparency Index. Given the exponential rise in societal reliance on large foundation models, ensuring transparency in their training approach, architecture, and downstream application is crucial for public accountability, scientific advancement, and effective governance. o uphold this objective, we publicly release our training code base, data sources, and evaluation pipeline. We assess the transparency of MOMENT using the criteria outlined by Bommasani et al. (2023), focusing on upstream resources utilized during training and model description, encompassing 32 and 33 transparency indicators, respectively. We report expected upstream and model transparency scores for MOMENT in Tab. 34. Notably, MOMENT is expected to have one of the highest levels of upstream transparency. However, it’s model transparency scores are lower, primarily due to comprehensive (external and third-party) harm and trustworthiness evaluations, which are not well understood in the context of time series modeling.

Environmental Impact. We train multiple models over many days resulting in significant energy usage and a sizeable carbon footprint. However, we hope that releasing our models will ensure that future time series modeling efforts are quicker and more efficient, resulting in lower carbon emissions.

We follow prior work (Bender et al., 2021; Patterson et al., 2021; Touvron et al., 2023; Wu et al., 2022; Dodge et al., 2022) and estimate the carbon footprint of pre-training all variants of MOMENT based on the GPU device used and the carbon efficiency of the electricity grid. Our estimated CO2 generation estimates are shown in Tab. 8.

We use the Total Graphics Power (TGP) to calculate the total power consumed for training MOMENT models, although the total power consumed by the GPU will likely vary a little based on the GPU utilization while training our model. Our calculations do not account for power demands from other sources of our compute. We use 336.566 Kg C02/MWH as the standard value of CO2 emission per megawatt hour of energy consumed for Pittsburgh[8].

We share an upper limit of the individual CO2 emission for each model, as well as a more realistic actual estimate for the carbon emissions from MOMENT-small and MOMENT-base, since they were trained simultaneously on a single Nvidia RTX A6000 GPU, and thus the power consumed by the GPU was shared for the training of both variants. MOMENT-large was trained independently on a single RTX A6000 GPU.

Ethical considerations and potential misuse. Despite MOMENT’s promising performance in limited-data settings, it is important to use its predictions with care, especially in high-stakes settings such as healthcare. Before MOMENT is used for high-stakes decision-making, we recommend fine-tuning and evaluating the model with task-specific in-domain data.

References

Ansari, A. F., Stella, L., Turkmen, C., Zhang, X., Mercado, P., Shen, H., Shchur, O., Rangapuram, S. S., Arango, S. P., Kapoor, S., et al. Chronos: Learning the language of time series. arXiv preprint arXiv:2403.07815, 2024.

Ba, J. L., Kiros, J. R., and Hinton, G. E. Layer normalization, 2016.

Bao, H., Dong, L., Piao, S., and Wei, F. BEit: BERT pretraining of image transformers. In International Conference on Learning Representations, 2022. URL https: //openreview.net/forum?id=p-BhZSz59o4.

Bender, E. M., Gebru, T., McMillan-Major, A., and Shmitchell, S. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, pp. 610–623, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383097. doi: 10.1145/ 3442188.3445922. URL https://doi.org/10. 1145/3442188.3445922.

Table 8. Total carbon emission induced upon training the MOMENT family of models. MOMENT-small and MOMENT-base were trained simultaneously on a single GPU, thus the TGP required for each model would likely be much less than 300W, and the total time for both models combined is equal to the maximum of the time required for each model. Actual total power consumption and carbon emission values account for this.Table 8. Total carbon emission induced upon training the MOMENT family of models. MOMENT-small and MOMENT-base were trained simultaneously on a single GPU, thus the TGP required for each model would likely be much less than 300W, and the total time for both models combined is equal to the maximum of the time required for each model. Actual total power consumption and carbon emission values account for this.

Bommasani, R., Klyman, K., Longpre, S., Kapoor, S., Maslej, N., Xiong, B., Zhang, D., and Liang, P. The foundation model transparency index, 2023.

Cai, Y., Goswami, M., Choudhry, A., Srinivasan, A., and Dubrawski, A. Jolt: Jointly learned representations of language and time-series. In Deep Generative Models for Health Workshop NeurIPS 2023, 2023.

California Department of Transportation. Performance measurement system (pems), 2024. URL http://pems. dot.ca.gov/. Accessed: 2024-02-01.

Cao, D., Jia, F., Arik, S. O., Pfister, T., Zheng, Y., Ye, W., and Liu, Y. Tempo: Prompt-based generative pre-trained transformer for time series forecasting, 2023.

Centers for Disease Control and Prevention. Fluview: Flu activity & surveillance, 2024. URL https://gis.cdc.gov/grasp/fluview/ fluportaldashboard.html. Accessed: 2024-02- 01.

Challu, C., Olivares, K. G., Oreshkin, B. N., Garza Ramirez, F., Mergenthaler Canseco, M., and Dubrawski, A. NHITS: Neural hierarchical interpolation for time series forecasting. Proceedings of the AAAI Conference on Artificial Intelligence, 37(6):6989–6997, Jun. 2023. doi: 10.1609/ aaai.v37i6.25854. URL https://ojs.aaai.org/ index.php/AAAI/article/view/25854.

Challu, C. I., Jiang, P., Nian Wu, Y., and Callot, L. Deep generative model with hierarchical latent factors for time series anomaly detection. In Camps-Valls, G., Ruiz, F. J. R., and Valera, I. (eds.), Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pp. 1643–1654. PMLR, 28–30 Mar 2022. URL https://proceedings.mlr.press/ v151/challu22a.html.

Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, Y., Wang, X., Dehghani, M., Brahma, S., et al. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.

Cui, Z., Chen, W., and Chen, Y. Multi-scale convolutional neural networks for time series classification, 2016.

Das, A., Kong, W., Sen, R., and Zhou, Y. A decoderonly foundation model for time-series forecasting. arXiv preprint arXiv:2310.10688, 2023.

Dau, H. A., Keogh, E., Kamgar, K., Yeh, C.-C. M., Zhu, Y., Gharghabi, S., Ratanamahatana, C. A., Yanping, Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G., and Hexagon-ML. The ucr time series classification archive, October 2018. https://www.cs.ucr. edu/˜eamonn/time_series_data_2018/.

Day, K., Christl, D., Salvi, R., and Sriram, P. Video pretrained transformer: A multimodal mixture of pre-trained experts, 2023.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C., and Solorio, T. (eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.

Dodge, J., Prewitt, T., Tachet des Combes, R., Odmark, E., Schwartz, R., Strubell, E., Luccioni, A. S., Smith, N. A., DeCario, N., and Buchanan, W. Measuring the carbon intensity of ai in cloud instances. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22, pp. 1877–1894, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450393522. doi: 10.1145/ 3531146.3533234. URL https://doi.org/10. 1145/3531146.3533234.

Dong, J., Wu, H., Zhang, H., Zhang, L., Wang, J., and Long, M. Simmtm: A simple pre-training framework for masked time-series modeling. In Advances in Neural Information Processing Systems, 2023.

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N. An image is worth 16×16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https:// openreview.net/forum?id=YicbFdNTTy.

Ekambaram, V., Jati, A., Nguyen, N. H., Dayama, P., Reddy, C., Gifford, W. M., and Kalagnanam, J. Tiny time mixers (ttms): Fast pre-trained models for enhanced zero/fewshot forecasting of multivariate time series, 2024.

Eldele, E., Ragab, M., Chen, Z., Wu, M., Kwoh, C. K., Li, X., and Guan, C. Time-series representation learning via temporal and contextual contrasting. In Zhou, Z.- H. (ed.), Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pp. 2352– 2359. International Joint Conferences on Artificial Intelligence Organization, 8 2021. doi: 10.24963/ijcai.2021/ 324. URL https://doi.org/10.24963/ijcai. 2021/324. Main Track.

Franceschi, J.-Y., Dieuleveut, A., and Jaggi, M. Unsupervised scalable representation learning for multivariate time series. In Wallach, H., Larochelle, H., Beygelzimer, A., d’Alche-Buc, F., Fox, E., and Garnett, R. ´ (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips. cc/paper_files/paper/2019/file/ 53c6de78244e9f528eb3e1cda69699bb-Paper. pdf.

Gao, L., Biderman, S., Black, S., Golding, L., Hoppe, T., Foster, C., Phang, J., He, H., Thite, A., Nabeshima, N., Presser, S., and Leahy, C. The pile: An 800gb dataset of diverse text for language modeling, 2020.

Garza, A. and Mergenthaler-Canseco, M. Timegpt-1. arXiv preprint arXiv:2310.03589, 2023.

Garza, F., Mergenthaler Canseco, M., Challu, C., and ´ Olivares, K. StatsForecast: Lightning fast forecasting with statistical and econometric models. PyCon Salt Lake City, Utah, US 2022, 2022. URL https: //github.com/Nixtla/statsforecast.

Godahewa, R. W., Bergmeir, C., Webb, G. I., Hyndman, R., and Montero-Manso, P. Monash time series forecasting archive. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021. URL https://openreview. net/forum?id=wEc1mgAjU-.

Goswami, M., Boecking, B., and Dubrawski, A. Weak supervision for affordable modeling of electrocardiogram data. In AMIA Annual Symposium Proceedings, volume 2021, pp. 536. American Medical Informatics Association, 2021.

Goswami, M., Challu, C. I., Callot, L., Minorics, L., and Kan, A. Unsupervised model selection for time series anomaly detection. In The Eleventh International Conference on Learning Representations, 2023a. URL https: //openreview.net/forum?id=gOZ_pKANaPW.

Goswami, M., Sanil, V., Choudhry, A., Srinivasan, A., Udompanyawit, C., and Dubrawski, A. AQua: A benchmarking tool for label quality assessment. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023b. URL https://openreview.net/forum? id=dhJ8VbcEtX.

Gruver, N., Finzi, M. A., Qiu, S., and Wilson, A. G. Large language models are zero-shot time series forecasters. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview. net/forum?id=md68e8iZK1.

He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016. doi: 10.1109/CVPR.2016.90.

Hundman, K., Constantinou, V., Laporte, C., Colwell, I., and Soderstrom, T. Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 387– 395, 2018.

Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., and Muller, P.-A. Deep learning for time series classification: a review. Data Mining and Knowledge Discovery, 33(4):917–963, 2019.

Jin, M., Wang, S., Ma, L., Chu, Z., Zhang, J. Y., Shi, X., Chen, P.-Y., Liang, Y., Li, Y.-F., Pan, S., and Wen, Q. Time-llm: Time series forecasting by reprogramming large language models, 2023.

Kim, T., Kim, J., Tae, Y., Park, C., Choi, J.-H., and Choo, J. Reversible instance normalization for accurate time-series forecasting against distribution shift. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum? id=cGDAkQo1C0p.

Lai, G., Chang, W.-C., Yang, Y., and Liu, H. Modeling longand short-term temporal patterns with deep neural networks. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR ’18, pp. 95–104, New York, NY, USA, 2018. Association for Computing Machinery. ISBN 9781450356572. doi: 10.1145/3209978.3210006. URL https://doi. org/10.1145/3209978.3210006.

Le Guennec, A., Malinowski, S., and Tavenard, R. Data Augmentation for Time Series Classification using Convolutional Neural Networks. In ECML/PKDD Workshop on Advanced Analytics and Learning on Temporal Data, Riva Del Garda, Italy, September 2016. URL https: //shs.hal.science/halshs-01357973.

Li, J., Li, D., Savarese, S., and Hoi, S. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023a.

Li, S., Jin, X., Xuan, Y., Zhou, X., Chen, W., Wang, Y.- X., and Yan, X. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Advances in neural information processing systems, 32, 2019.

Li, Y., Fan, H., Hu, R., Feichtenhofer, C., and He, K. Scaling language-image pre-training via masking. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 23390–23400, Los Alamitos, CA, USA, jun 2023b. IEEE Computer Society. doi: 10.1109/CVPR52729.2023.02240. URL https://doi.ieeecomputersociety.org/ 10.1109/CVPR52729.2023.02240.

Li, Z., Rao, Z., Pan, L., Wang, P., and Xu, Z. Ti-mae: Self-supervised masked time series autoencoders, 2023c.

Liu, S., Yu, H., Liao, C., Li, J., Lin, W., Liu, A. X., and Dustdar, S. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum? id=0EXmFzUn5I.

Liu, Y., Hu, T., Zhang, H., Wu, H., Wang, S., Ma, L., and Long, M. itransformer: Inverted transformers are effective for time series forecasting. arXiv preprint arXiv:2310.06625, 2023.

Loshchilov, I. and Hutter, F. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019. URL https://openreview. net/forum?id=Bkg6RiCqY7.

Lu, K., Grover, A., Abbeel, P., and Mordatch, I. Frozen pretrained transformers as universal computation engines. Proceedings of the AAAI Conference on Artificial Intelligence, 36(7):7628–7636, Jun. 2022. doi: 10.1609/ aaai.v36i7.20729. URL https://ojs.aaai.org/ index.php/AAAI/article/view/20729.

Ma, Q., Liu, Z., Zheng, Z., Huang, Z., Zhu, S., Yu, Z., and Kwok, J. T. A survey on time-series pre-trained models, 2023.

Max Planck Institute for Biogeochemistry. Weather data, 2024. URL https://www.bgc-jena.mpg.de/ wetter/. Accessed: 2024-02-01.

Narwariya, J., Malhotra, P., Vig, L., Shroff, G., and Vishnu, T. V. Meta-learning for few-shot time series classification. In Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, CoDS COMAD 2020, pp. 28–36, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450377386. doi: 10. 1145/3371158.3371162. URL https://doi.org/ 10.1145/3371158.3371162.

Nie, Y., Nguyen, N. H., Sinthong, P., and Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers. In The Eleventh International Conference on Learning Representations, 2023. URL https:// openreview.net/forum?id=Jbdc0vTOcol.

Oreshkin, B. N., Carpov, D., Chapados, N., and Bengio, Y. N-beats: Neural basis expansion analysis for interpretable time series forecasting. In International Conference on Learning Representations, 2020. URL https: //openreview.net/forum?id=r1ecqn4YwB.

Oreshkin, B. N., Carpov, D., Chapados, N., and Bengio, Y. Meta-learning framework with applications to zero-shot time-series forecasting. Proceedings of the AAAI Conference on Artificial Intelligence, 35(10): 9242–9250, May 2021. doi: 10.1609/aaai.v35i10. 17115. URL https://ojs.aaai.org/index. php/AAAI/article/view/17115.

Paparrizos, J., Boniol, P., Palpanas, T., Tsay, R. S., Elmore, A., and Franklin, M. J. Volume under the surface: A new accuracy evaluation measure for timeseries anomaly detection. Proc. VLDB Endow., 15(11): 2774–2787, jul 2022a. ISSN 2150-8097. doi: 10. 14778/3551793.3551830. URL https://doi.org/ 10.14778/3551793.3551830.

Paparrizos, J., Kang, Y., Boniol, P., Tsay, R. S., Palpanas, T., and Franklin, M. J. Tsb-uad: An end-to-end benchmark suite for univariate time-series anomaly detection. Proc. VLDB Endow., 15(8):1697–1711, apr 2022b. ISSN 2150- 8097. doi: 10.14778/3529337.3529354. URL https: //doi.org/10.14778/3529337.3529354.

Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguia, L.- M., Rothchild, D., So, D., Texier, M., and Dean, J. Carbon emissions and large neural network training, 2021.

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., and Sutskever, I. Learning transferable visual models from natural language supervision. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 8748–8763. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/ v139/radford21a.html.

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020. URL http://jmlr. org/papers/v21/20-074.html.

Ramaswamy, S., Rastogi, R., and Shim, K. Efficient algorithms for mining outliers from large data sets. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD ’00, pp. 427–438, New York, NY, USA, 2000. Association for Computing Machinery. ISBN 1581132174. doi: 10.1145/342009.335437. URL https://doi.org/ 10.1145/342009.335437.

Rasul, K., Ashok, A., Williams, A. R., Khorasani, A., Adamopoulos, G., Bhagwatkar, R., Bilos, M., Ghonia, H., ˇ Hassen, N. V., Schneider, A., et al. Lag-llama: Towards foundation models for time series forecasting. arXiv preprint arXiv:2310.08278, 2023.

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. doi: 10.1007/s11263-015-0816-y.

Schmidl, S., Wenig, P., and Papenbrock, T. Anomaly detection in time series: A comprehensive evaluation. Proc. VLDB Endow., 15(9):1779–1797, may 2022. ISSN 2150- 8097. doi: 10.14778/3538598.3538602. URL https: //doi.org/10.14778/3538598.3538602.

Schneider, S. H. and Dickinson, R. E. Climate modeling. Reviews of Geophysics, 12(3):447–493, 1974.

Serra, J., Pascual, S., and Karatzoglou, A. Towards a ` universal neural network encoder for time series. In International Conference of the Catalan Association for Artificial Intelligence, 2018. URL https://api. semanticscholar.org/CorpusID:13675490.

Shaw, P., Uszkoreit, J., and Vaswani, A. Self-attention with relative position representations. arXiv preprint arXiv:1803.02155, 2018.

Shen, J., Li, L., Dery, L. M., Staten, C., Khodak, M., Neubig, G., and Talwalkar, A. Cross-modal fine-tuning: Align then refine, 2023.

Smith, L. N. and Topin, N. Super-convergence: Very fast training of neural networks using large learning rates. In Artificial intelligence and machine learning for multidomain operations applications, volume 11006, pp. 369– 386. SPIE, 2019.

Su, Y., Zhao, Y., Niu, C., Liu, R., Sun, W., and Pei, D. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 2828–2837, 2019.

Talukder, S., Yue, Y., and Gkioxari, G. Totem: Tokenized time series embeddings for general time series analysis, 2024.

Tanisaro, P. and Heidemann, G. Time series classification using time warping invariant echo state networks. In 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 831–836, 2016. doi: 10.1109/ICMLA.2016.0149.

Tolstikhin, I. O., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., Yung, J., Steiner, A., Keysers, D., Uszkoreit, J., et al. Mlp-mixer: An all-mlp architecture for vision. Advances in neural information processing systems, 34:24261–24272, 2021.

Tonekaboni, S., Eytan, D., and Goldenberg, A. Unsupervised representation learning for time series with temporal neighborhood coding. In International Conference on Learning Representations, 2021. URL https: //openreview.net/forum?id=8qDwejCuCN.

Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., Bikel, D., Blecher, L., Ferrer, C. C., Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., Fuller, B., Gao, C., Goswami, V., Goyal, N., Hartshorn, A., Hosseini, S., Hou, R., Inan, H., Kardas, M., Kerkez, V., Khabsa, M., Kloumann, I., Korenev, A., Koura, P. S., Lachaux, M.-A., Lavril, T., Lee, J., Liskovich, D., Lu, Y., Mao, Y., Martinet, X., Mihaylov, T., Mishra, P., Molybog, I., Nie, Y., Poulton, A., Reizenstein, J., Rungta, R., Saladi, K., Schelten, A., Silva, R., Smith, E. M., Subramanian, R., Tan, X. E., Tang, B., Taylor, R., Williams, A., Kuan, J. X., Xu, P., Yan, Z., Zarov, I., Zhang, Y., Fan, A., Kambadur, M., Narang, S., Rodriguez, A., Stojnic, R., Edunov, S., and Scialom, T. Llama 2: Open foundation and fine-tuned chat models, 2023.

Trindade, A. ElectricityLoadDiagrams20112014. UCI Machine Learning Repository, 2015. DOI: https://doi.org/10.24432/C58C86.

Van Den Oord, A., Vinyals, O., et al. Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.

van der Maaten, L. Accelerating t-sne using tree-based algorithms. Journal of Machine Learning Research, 15 (93):3221–3245, 2014. URL http://jmlr.org/ papers/v15/vandermaaten14a.html.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. Attention is all you need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips. cc/paper_files/paper/2017/file/ 3f5ee243547dee91fbd053c1c4a845aa-Paper. pdf.

Wang, Z., Yan, W., and Oates, T. Time series classification from scratch with deep neural networks: A strong baseline. In 2017 International Joint Conference on Neural Networks (IJCNN), pp. 1578–1585, 2017. doi: 10.1109/IJCNN.2017.7966039.

Wen, Q., Zhou, T., Zhang, C., Chen, W., Ma, Z., Yan, J., and Sun, L. Transformers in time series: A survey. In Elkind, E. (ed.), Proceedings of the ThirtySecond International Joint Conference on Artificial Intelligence, IJCAI-23, pp. 6778–6786. International Joint Conferences on Artificial Intelligence Organization, 8 2023. doi: 10.24963/ijcai.2023/759. URL https: //doi.org/10.24963/ijcai.2023/759. Survey Track.

Woo, G., Liu, C., Kumar, A., Xiong, C., Savarese, S., and Sahoo, D. Unified training of universal time series forecasting transformers. arXiv preprint arXiv:2402.02592, 2024.

Wu, C.-J., Raghavendra, R., Gupta, U., Acun, B., Ardalani, N., Maeng, K., Chang, G., Behram, F. A., Huang, J., Bai, C., Gschwind, M., Gupta, A., Ott, M., Melnikov, A., Candido, S., Brooks, D., Chauhan, G., Lee, B., Lee, H.-H. S., Akyildiz, B., Balandat, M., Spisak, J., Jain, R., Rabbat, M., and Hazelwood, K. Sustainable ai: Environmental implications, challenges and opportunities, 2022.

Wu, H., Xu, J., Wang, J., and Long, M. Autoformer: Decomposition transformers with auto-correlation for longterm series forecasting. In Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, 2021. URL https: //openreview.net/forum?id=I55UqU-M11y.

Wu, H., Hu, T., Liu, Y., Zhou, H., Wang, J., and Long, M. Timesnet: Temporal 2d-variation modeling for general time series analysis. In The Eleventh International Conference on Learning Representations, 2023. URL https: //openreview.net/forum?id=ju_Uqw384Oq.

Wu, R. and Keogh, E. J. Current time series anomaly detection benchmarks are flawed and are creating the illusion of progress. IEEE Transactions on Knowledge & Data Engineering, 35(03):2421–2429, mar 2023. ISSN 1558- 2191. doi: 10.1109/TKDE.2021.3112126.

Xie, Z., Zhang, Z., Cao, Y., Lin, Y., Bao, J., Yao, Z., Dai, Q., and Hu, H. Simmim: a simple framework for masked image modeling. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9643– 9653, 2022. doi: 10.1109/CVPR52688.2022.00943.

Xu, H., Chen, W., Zhao, N., Li, Z., Bu, J., Li, Z., Liu, Y., Zhao, Y., Pei, D., Feng, Y., et al. Unsupervised anomaly detection via variational auto-encoder for seasonal kpis in web applications. In Proceedings of the 2018 world wide web conference, pp. 187–196, 2018.

Xu, J., Wu, H., Wang, J., and Long, M. Anomaly transformer: Time series anomaly detection with association discrepancy. In International Conference on Learning Representations, 2022. URL https://openreview. net/forum?id=LzQQ89U1qm_.

Yue, Z., Wang, Y., Duan, J., Yang, T., Huang, C., Tong, Y., and Xu, B. Ts2vec: Towards universal representation of time series. Proceedings of the AAAI Conference on Artificial Intelligence, 36(8):8980–8987, Jun. 2022. doi: 10. 1609/aaai.v36i8.20881. URL https://ojs.aaai. org/index.php/AAAI/article/view/20881.

Zebik, M., Korytkowski, M., Angryk, R., and Scherer, R. Convolutional Neural Networks for Time Series Classification, pp. 635–642. Springer International Publishing, Cham, 2017. ISBN 978-3-319-59060-8. doi: 10.1007/978-3-319-59060-8 57. URL https://doi. org/10.1007/978-3-319-59060-8_57.

Zerveas, G., Jayaraman, S., Patel, D., Bhamidipaty, A., and Eickhoff, C. A transformer-based framework for multivariate time series representation learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21,pp. 2114–2124, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383325. doi: 10.1145/3447548.3467401. URL https://doi. org/10.1145/3447548.3467401.

Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., and Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. Proceedings of the AAAI Conference on Artificial Intelligence, 35(12):11106–11115, May 2021. doi: 10.1609/ aaai.v35i12.17325. URL https://ojs.aaai.org/ index.php/AAAI/article/view/17325.

Zhou, T., Ma, Z., Wen, Q., Wang, X., Sun, L., and Jin, R. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proc. 39th International Conference on Machine Learning (ICML 2022), 2022.

Zhou, T., Niu, P., Wang, X., Sun, L., and Jin, R. One fits all: Power general time series analysis by pretrained LM. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview. net/forum?id=gMS6FVZvmF.

Authors:

(1) Mononito Goswami, Auton Lab, Robotics Insititute, Carnegie Mellon University, Pittsburgh, USA ([email protected])

(2) Konrad Szafer, Auton Lab, Robotics Institute, Carnegie Mellon University, Pittsburgh, USA, with equal contribution, order decided using a random generator;

(3) Arjun Choudhry, Auton Lab, Robotics Institute, Carnegie Mellon University, Pittsburgh, USA, with equal contribution, order decided using a random generator;

(4) Yifu Cai, Auton Lab, Robotics Institute, Carnegie Mellon University, Pittsburgh, USA;

(5) Shuo Li, University of Pennsylvania, Philadelphia, USA;

(6) Artur Dubrawski, Auton Lab, Robotics Institute, Carnegie Mellon University, Pittsburgh, USA.


[6] https://huggingface.co/AutonLab/ MOMENT-1-large

[7] https://huggingface.co/datasets/ AutonLab/Timeseries-PILE

[8] https://emissionsindex.org/

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button