login
Home / Papers / Exploring Neural Network Structure through Sparse Recurrent Neural Networks: A...

Exploring Neural Network Structure through Sparse Recurrent Neural Networks: A Recasting and Distillation of Neural Network Hyperparameters

1 Citations•2023•
Quincy Hershey, Randy Paffenroth, Harsh Nilesh Pathak
2023 International Conference on Machine Learning and Applications (ICMLA)

The potential of RNNs to be better realized through sparse parameterizations is found, which significantly improve the stability and expressiveness of model performance across a wider array of hyperparameters while improving performance differentials at significantly reduced weight counts.

Abstract

This paper explores the role of sparse parameteri-zations on Recurrent Neural Network (RNN) performance using anomaly detection tasks. The findings indicate sparsity plays a significant role in both improving training stability and overall performance of RNNs. Sparsity of the weight matrix is initially left to vary while the dimensions of the weight matrix remain fixed, causing total trainable weights to vary in response. The sparse RNN models demonstrate surprisingly low performance sensitivity to the resulting changes in total trainable weights. Building on these results, sparsity of the weight matrix is again left to vary with a fixed number of trainable weights and the dimensions of the weight matrix permitted to change with sparsity. By isolating the impact of sparse parameterizations, RNNs show improved training stability and overall performance. Moreover, the results indicate an optimal band of sparsity within which RNNs perform most efficiently across a difficult time-series task. Sparse RNN configurations are compared to LSTM models in anomaly detection tasks. Sparse RNN models demonstrate a strong performance advantage over LSTM models, even out-performing in scenarios where memory is crucial and LSTM models are expected to perform better. The results indicate the reputation of RNN training instability may be exacerbated by overparamaterization resulting from dense configurations. Several characteristics such as model depth and model sparsity are presented which significantly improve the stability and expressiveness of model performance across a wider array of hyperparameters while improving performance differentials at significantly reduced weight counts. Ultimately the paper finds the potential of RNNs to be better realized through sparse parameterizations.