Approximating Activation Functions

17 Jan 2020  ·  Nicholas Gerard Timmons, Andrew Rice ·

ReLU is widely seen as the default choice for activation functions in neural networks. However, there are cases where more complicated functions are required. In particular, recurrent neural networks (such as LSTMs) make extensive use of both hyperbolic tangent and sigmoid functions. These functions are expensive to compute. We used function approximation techniques to develop replacements for these functions and evaluated them empirically on three popular network configurations. We find safe approximations that yield a 10% to 37% improvement in training times on the CPU. These approximations were suitable for all cases we considered and we believe are appropriate replacements for all networks using these activation functions. We also develop ranged approximations which only apply in some cases due to restrictions on their input domain. Our ranged approximations yield a performance improvement of 20% to 53% in network training time. Our functions also match or considerably out perform the ad-hoc approximations used in Theano and the implementation of Word2Vec.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here