I’m working on the swift port of ConvnetJS - a deep learning library. And I’m thinking about GPU acceleration for this project in Metal. So I performed some quick tests to see how much I can gain in performance with Metal and on which data sizes. I chose sigmoid $y = {1 \over 1+e^{-x}}$, rectifier $y = max(0, x)$ and hyperbolic tangent $y = {e^{2x}-1 \over e^{2x}+1}$ functions for my experiments because they are important activation functions in neural networks. All tests were performed on the iPhone 5s iOS 9.1 (13B143).

## Functions to test

Metal kernels:

Two Swift implementation with map and for loop respectively:

Rectifier and hyperbolic tangent functions were implemented in similar fashion.

## Results

Data (in seconds)

Sigmoid:

Metal for loop map
×1 0.0564465416 0.000349125 0.00016975
×10 0.0021658333 0.0000578 0.0000655
×100 0.0028236667 0.0000907 0.0000672
×1000 0.0021074583 0.0006860833 0.0006257084
×10000 0.0038219166 0.0063114166 0.0065755
×100000 0.0098142499 0.0634174166 0.0634255416
×1000000 0.06469075 0.63862575 0.6045135832
×10000000 0.7409762917 6.3516480417 6.35516875

Rectifier:

Metal for loop map
×1 0.0049757083 0.00018075 0.0001782917
×10 0.0032745417 0.0001197084 0.0000144
×100 0.0020335417 0.0000966 0.0000674
×1000 0.003079125 0.0007750001 0.0006028333
×10000 0.0038103334 0.0072056249 0.0060474999
×100000 0.0098598334 0.0687627917 0.0607689167
×1000000 0.0700035417 0.6864432917 0.607680875
×10000000 0.7093120418 6.8916719584 6.093184

Hyperbolic Tanhent:

Metal for loop map
×1 0.095888125 0.000841625 0.000172875
×10 0.0030728333 0.0000663 0.0000433
×100 0.0019839583 0.00016625 0.000088
×1000 0.0029159584 0.0006575417 0.0006404166
×10000 0.0037110833 0.0062489166 0.00606475
×100000 0.0097689166 0.0632590416 0.0606999584
×1000000 0.0587829167 0.63312125 0.6082397084
×10000000 0.5905967084 6.3275603751 6.08179275

## Bar charts

The full height of each bar represents the time spent by all three implementations (Metal, for loop and map) in total. The height of each colourful segment on the plot is the time spent by the function in comparison to the time spent by two other functions.

It should be clear from the plot, that if you’re operating with the vectors of less than 10 000 elements there is no point to use Metal. On the other hand, if your data is sufficiently big, there is no reason not to use Metal on iOS or latest OS X.

R script to generate bar graphs