Deep Learning on Caffe-on-Spark

Distributed Deep Learning on Spark (Using Yahoo’s Caffe-on-Spark)

Caffe-on-Spark is a result of Yahoo’s early steps in bringing Apache Hadoop ecosystem and deep learning together on the same heterogeneous (GPU+CPU) cluster that may be open sourced depending on interest from the community.

To enable deep learning on these enhanced Hadoop clusters, we developed a comprehensive distributed solution based upon open source software libraries, Apache Spark and Caffe. One can now submit deep learning jobs onto a (Hadoop YARN) cluster of GPU nodes (using spark-submit).
Source :

Yahoo’s Caffe-on-Spark

To enable deep learning, Yahoo added GPU nodes into their Hadoop clusters with each node having 4 Nvidia Tesla K80 cards, each card with two GK210 GPUs. These nodes have 10x processing power than the traditional commodity CPU nodes they generally use in their Hadoop clusters.


Yahoo has progressively invested in building and scaling Apache Hadoop clusters with a current footprint of more than 40,000 servers and 600 petabytes of storage spread across 19 clusters.

Hadoop clusters are the preferred platform for large-scale machine learning at Yahoo. Deep learning is used in many of Yahoo’s products, such as Flickr’s Magic View feature which automatically tags all user photos, enabling Flickr end users to organize and find photos easily. Read “Picture This: NVIDIA GPUs Sort Through Tens of Millions of Flickr Photos” for more information on the feature.

To enable deep learning on these enhanced clusters, the Yahoo Big Data and Machine Learning team developed a comprehensive distributed solution based upon open source software libraries, Apache Spark and Caffe. Caffe-on-Spark enables multiple GPUs, and multiple machines to be used for deep learning.

Source :

Caffe on Spark for Deep Learning from Yahoo

Is Apache Spark a Good Framework for Implementating Deep Learning?

It Depends.
Yes, if your objectives are one or more of these:

  1. To quickly implement some aspect of DL using existing/emerging libraries, and you already have a Spark cluster handy. In that case, consider, e.g. guoding83128/OpenDL, Lightning-Fast Deep Learning on Spark, Implementing a Distributed Deep Learning Network over Spark
  2. To experiment with developing different ideas for distributed DL, e.g., variations on Downpour SGD or Parameter Server without having to learn a strange new compute model like CUDA.
  3. To experiment with potentially interesting architectures, e.g., using Spark CPUs to drive GPU coprocessors in a distributed context.
  4. For some situational reasons, your need to horizontally scale your huge dataset is only satisfiable with a Spark cluster and not, e.g., a single-chassis GPU box.
  5. Generally speaking, absolute speed (relative to other available approaches like GPU) is not the main concern.
  6. Etc.

No, if your objectives are one or more of (here the scenarios are biased toward situations that warrant GPU-based implementations):

  1. You want the optimal high performance/programming flexibility sweet spot of GPUs and can learn/already know CUDA / Caffe | Deep Learning Framework
  2. You don’t have a particular affinity for Spark/Java/Scala/Python and prefer to invest your time/effort in what is becoming the current dominant architectures for DL implementations (GPU/CUDA).
  3. You want to push the envelope even further (perhaps for not very good today-relevant reasons ) and go to FGPAs/ASICs and are willing to give up the flexibility of CPUs/GPUs.
  4. You want to work on leading-edge R&D and produce amazing refereed-journal results (e.g., “4 GPU DNN Beats Google’s 14,000-Machine Brain At Recognizing Cats and Jennifer Aniston”) on the most energy/cost efficient platforms today.
  5. Etc.

Source :

Apache Spark for Implementating Deep Learning