CaffeOnSpark (homepage)

Scalable deep learning running Caffe inside Spark executors with peer-to-peer communication

@yahoo / (1)

CaffeOnSpark brings deep learning to Spark clusters. By combining salient features from Caffe and Apache Spark, CaffeOnSpark enables distributed deep learning on a cluster of GPU and CPU servers with peer-to-peer communication over Ethernet or Infiniband. It provides model training, testing, and feature extraction. Its CLI and API provide an easy mechanism to invoke deep learning over distributed datasets. Caffe users can perform distributed learning using their existing LMDB data files and minorly adjusted network configuration.

CaffeOnSpark was developed by Yahoo for large-scale distributed deep learning on Hadoop clusters. It works on both private cloud and public cloud (ex. AWS EC2).


  • 1|machine learning
  • 1|deep learning
  • 1|caffe
  • 1|Yahoo

How to

This package doesn't have any releases published in the Spark Packages repo, or with maven coordinates supplied. You may have to build this package from source, or it may simply be a script. To use this Spark Package, please follow the instructions in the README.


No releases yet.