Metal for GPGPU
I’m working on the swift port of ConvnetJS - a deep learning library. And I’m thinking about GPU acceleration for this project in Metal. So I performed some quick tests to see how much I can gain in performance with Metal and on which data sizes. I chose sigmoid \(y = {1 \over 1+e^{-x}}\), rectifier \(y = max(0, x)\) and hyperbolic tangent \(y = {e^{2x}-1 \over e^{2x}+1}\) functions for my experiments because they are important activation functions in neural networks. All tests were performed on the iPhone 5s iOS 9.1 (13B143).
Functions to test
Metal kernels:
Two Swift implementation with map
and for
loop respectively:
Rectifier and hyperbolic tangent functions were implemented in similar fashion.
Measurment block:
Results
Data (in seconds)
Sigmoid:
Metal | for loop | map | |
---|---|---|---|
×1 | 0.0564465416 | 0.000349125 | 0.00016975 |
×10 | 0.0021658333 | 0.0000578 | 0.0000655 |
×100 | 0.0028236667 | 0.0000907 | 0.0000672 |
×1000 | 0.0021074583 | 0.0006860833 | 0.0006257084 |
×10000 | 0.0038219166 | 0.0063114166 | 0.0065755 |
×100000 | 0.0098142499 | 0.0634174166 | 0.0634255416 |
×1000000 | 0.06469075 | 0.63862575 | 0.6045135832 |
×10000000 | 0.7409762917 | 6.3516480417 | 6.35516875 |
Rectifier:
Metal | for loop | map | |
---|---|---|---|
×1 | 0.0049757083 | 0.00018075 | 0.0001782917 |
×10 | 0.0032745417 | 0.0001197084 | 0.0000144 |
×100 | 0.0020335417 | 0.0000966 | 0.0000674 |
×1000 | 0.003079125 | 0.0007750001 | 0.0006028333 |
×10000 | 0.0038103334 | 0.0072056249 | 0.0060474999 |
×100000 | 0.0098598334 | 0.0687627917 | 0.0607689167 |
×1000000 | 0.0700035417 | 0.6864432917 | 0.607680875 |
×10000000 | 0.7093120418 | 6.8916719584 | 6.093184 |
Hyperbolic Tanhent:
Metal | for loop | map | |
---|---|---|---|
×1 | 0.095888125 | 0.000841625 | 0.000172875 |
×10 | 0.0030728333 | 0.0000663 | 0.0000433 |
×100 | 0.0019839583 | 0.00016625 | 0.000088 |
×1000 | 0.0029159584 | 0.0006575417 | 0.0006404166 |
×10000 | 0.0037110833 | 0.0062489166 | 0.00606475 |
×100000 | 0.0097689166 | 0.0632590416 | 0.0606999584 |
×1000000 | 0.0587829167 | 0.63312125 | 0.6082397084 |
×10000000 | 0.5905967084 | 6.3275603751 | 6.08179275 |
Bar charts
The full height of each bar represents the time spent by all three implementations (Metal, for loop and map) in total. The height of each colourful segment on the plot is the time spent by the function in comparison to the time spent by two other functions.
It should be clear from the plot, that if you’re operating with the vectors of less than 10 000 elements there is no point to use Metal. On the other hand, if your data is sufficiently big, there is no reason not to use Metal on iOS or latest OS X.
R script to generate bar graphs
Useful links
- GPGPU Performance of Swift/Metal vs Accelerate on iPhone 6 & 5S, iPad Air and iPad Mini - this article is much more detailed than mine. Unfortunately, I found it only when I’ve already finished this post.
- Exploring GPGPU on iOS (with OpenGL ES).
- Data-Parallel Programming with Metal and Swift for iPhone/iPad GPU - some useful pieces of code.
- Calculate execution time for a block of Swift code - gist.