Open source deep learning neural networks are coming of age. There are several frameworks that are providing advanced machine learning and artificial intelligence (A.I.) capabilities over proprietary solutions. How do you determine which open source framework is best for you?
In “Big data – a road map for smarter data,” I describe a set of machine learning architectures that will provide advanced capabilities to include image, handwriting, video, and speech recognition, natural language processing and object recognition. There is no perfect deep learning network that will solve all your business problems. Hopefully, the below table with the accommodating descriptive outline will provide you insights towards the best fit for purpose framework for your business problem.
The below figure, Deep Learning Frameworks, summarizes most of the popular open source deep network repositories in GitHub. The ranking is based on the number of stars awarded by developers in GitHub. The numbers were compiled at the beginning of May of 2017.
Google’s TensorFlow grew out of an earlier Google library called DistBelief V2, which is a proprietary deep net library developed as part of the Google Brain project. Some have described TensorFlow as re-architecting Theano from the ground up.
When Google open sourced TensorFlow, it gained a large developer following immediately. TensorFlow supports a broad set of capabilities such as image, handwriting and speech recognition, forecasting and natural language processing. TensorFlow is released under the Apache 2.0 open source license on November 9, 2015.
TensorFlow announced version 1.0 on February 15, 2017. This release is an accumulation of eight prior releases which addresses a lot of the incomplete core capabilities and performance issues that TensorFlow suffered from. Below is a list of features that are contributing to TensorFlow success.
TensorFlow provides these tools:
TensorBoard is a very well designed visualization tool for network modeling and performance.
TensorFlow Serving makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs. TensorFlow Serving provides out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data.
TensorFlow programming interfaces includes Python & C++. With the version 1.0 announcement, alpha releases of Java, GO, R, and Haskell API will be supported. Additionally, TensorFlow is supported in Google and Amazon Cloud Environment.
TensorFlow supports Windows 7, 10 & Server 2016 with the 0.12 release. Libraries can be compiled and optimized on ARM architecture because it uses the C++ Eigen library. This means that you can deploy your trained models on a variety of servers or mobile devices without having to implement a separate model decoder or load a Python interpreter.
TensorFlow supports fine grain network layers that allows users to build new complex layer types without implementing them in a low-level language. Subgraph execution allows you to introduce and retrieve the results of discretionary data on any edge of the graph. This is extremely helpful for debugging complicated computational graphs.
Distributed TensorFlow was introduced with version 0.8 allowing for model parallelism, which means that different portions of the model are trained on different devices in parallel.
The framework is being taught at Stanford University, Berkeley College, University of Toronto and Udacity, a free massive open online school as of March 2016.
TensorFlow downsides are:
Every computational flow in TensorFlow must be constructed as a static graph, and lacks symbolic loops. This makes some computations difficult.
There is no 3-D convolution, which is useful for video recognition.
Even with TensorFlow being 58x faster than its inception version (v0.5) it still lags on execution performance over its competitors.
Caffe is the brainchild of Yangqing Jia who is now the lead engineering for Facebook AI platform. Caffe is perhaps the first mainstream industry-grade deep learning toolkit, started in late 2013. Due to its excellent convolutional model, it is one of the most popular toolkits within the computer vision community and won an ImageNet Challenge in 2014. Caffe is released under the BSD 2-Clause license.
Speed makes Caffe perfect for research experiments and commercial deployment. Caffe can process over 60M images per day with a single Nvidia K40 GPU. That’s 1 ms/image for inference and 4 ms/image for learning and more recent library versions are faster still.
Caffe is C++ based, which can be compiled on a variety of devices. It is cross-platform and includes a port to windows. Caffe supports C++, Matlab and Python programming interfaces. Caffe has a large user community that contributes to their own deep net repository known as the “Model Zoo.” AlexNet and GoogleNet are two popular user-made nets available to the community.
Caffe is a popular deep learning network for vision recognition. However, Caffe does not support fine granularity network layers like those found in TensorFlow, CNTK and Theano. Building complex layer types has to be done in a low-level language. Its support for recurrent networks and language modeling in general is poor, due to its legacy architecture.
Yangqing and his team at Facebook is now working on Caffe 2. On April 18, 2017 Facebook open sourced Caffe 2 under the BSD license agreement. How is Caffe 2 different from Caffe? Caffe 2 is more focused on being modular and excelling at mobile and at large scale deployments. Like TensorFlow, Caffe 2 will support ARM architecture using the C++ Eigen library.
Caffe models can be easily converted to Caffe 2 models with a utility script. Caffe design choices made it ideal for handling vision type problems. Caffe 2 continues the strong support for vision type problems but adds in recurrent neural networks (RNN) and long short term memory (LSTM) networks for natural language processing, handwriting recognition, and time series forecasting.
Expect to see Caffe 2 overtake Caffe within the near future as its announcement is socialized within the deep learning community.
Microsoft Cognitive Toolkit
Microsoft Cognitive Toolkit (CNTK) is a deep neural network that was initially developed for the advancement of speech recognition. CNTK supports both RNN and CNN type of neural models which make it a good candidate for handling image, handwriting and speech recognition problems. CNTK supports both 64-bit Linux and Windows operating systems using either Python or C++ programming interfaces and is released under a MIT license.
CNTK shares the same make up as in TensorFlow and Theano in that its network is specified as a symbolic graph of vector operations, such as matrix add/multiply or convolution. Also, like TensorFlow and Theano, CNTK allows fine granularity of building network layers. The fine granularity of the building blocks (operations) allows users to invent new complex layer types without implementing them in a low-level language (as in Caffe).
Like Caffe, CNTK is also C++ based and has cross-platform CPU/GPU support. CNTK on Azure GPU Lab offers the most efficient distributed computational performance. Currently, CNTK lack of support on ARM architecture limits its capability on mobile devices.
MXNet supports deep learning architectures such as Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) including Long Short-Term Memory (LTSM) networks. This framework provides excellent capabilities for imaging, handwriting and speech recognition, forecasting and natural language processing. Some have call MXNet the world’s best image classifier.
MXNet has powerful techniques that include the ability to scale, such as GPU parallelism and memory mirror, programmer development speed and portability. Additionally, MXNet integrates with Apache Hadoop YARN — a general-purpose, distributed, application management framework that makes MXNet a contender to TensorFlow.
MXNet has the distinction of being one of the few deep network frameworks that supports the Generative Adversarial Network (GAN) model. This model is used in Nash equilibrium to perform experimental economics methods.
Torch was developed by Ronan Collobert and Soumith Chintala of Facebook, Clement Farabet who was part of Twitter and is now at Nvidia, and Koray Kavukcuoglu of Google Deep Mind. Torch main contributor is Facebook, Twitter and Nvidia. Torch is licensed under the BSD 3 clause license. However, with Facebook’s most recent announcement, it is changing course and making Caffe 2 its primary deep learning framework so it can deploy deep learning on mobile devices.
Torch is implemented in the Lua programming language. Lua is not a mainstream language which will impact the overall developer productivity until your staff is well versed in it.
Torch lacks the distributed application management framework of TensorFlow or the support of YARN in MXNet or Deeplearning4J. The lack of a wide number of programming language API’s also limits the development audience.
Deeplearning4J (DL4J) is an Apache 2.0-licensed, open source, distributed neural net library written in Java and Scala. DL4J is the brainchild of SkyMind’s Adam Gibson and is the only commercial-grade deep learning network that integrates with Hadoop and Spark that orchestrates multiple host threads. DL4J is unique deep learning framework, as it uses Map-Reduce to train the network while relying on other libraries to perform large matrix operations.
DL4J framework comes with built-in GPU support, which is an important feature for the training process and supports YARN, Hadoop’s distributed, application management framework. DL4J has a rich set of deep network architecture support: RBM, DBN, Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), RNTN, and Long Short-Term Memory (LTSM) network. DL4J also includes support for a vectorization library called Canova.
DL4J, being implemented in Java, is inherently faster than Python. It is as fast as Caffe for non-trivial image recognition tasks using multiple GPUs. This framework provides excellent capabilities for image recognition, fraud detection and natural language processing.
Theano is actively maintained by the University of Montreal Institute for Learning Algorithms (MILA) department. Headed by Yoshua Bengio, the creator of Theano, this lab is a huge contributor to deep learning research, boasting about 100 students and faculty. Theano supports rapid development of efficient machine learning algorithms and is released under a BSD license.
Theano architecture lacks the elegance of TensorFlow, but provides capabilities like symbolic API supports looping control, so-called scan, which makes implementing RNNs easy and efficient.
Theano supports many types of convolution for hand writing and image classification including medical images. Theano uses 3-D convolution / pooling for video classification. Theano can provide natural language processing task to include language understanding, translation, and generation. Theano also supports the Generative Adversarial Network (GAN), which was invented by a MILA student that is now at Google.
Theano supports extensions for mulit-GPU data parallelism and has a distributed framework for training models built in Theano. Theano only supports one programming development language and is a great tool for academic research and runs more efficiently than TensorFlow. However, the lack for mobile platform and other programming API’s presents challenges for developing and supporting enterprise wide applications with Theano.
Open source vs. proprietary
As deep learning continues to mature, it is a foregone conclusion that you’re going to witness a horse race between TensorFlow, Caffe 2 and MXNet. As software vendors develop products with advanced A.I. capabilities for getting the most from your data. The risk: do you purchase products that are built with A.I. capabilities that are proprietary or those that use open source frameworks? With open source, you have the dilemma of trying to determine which deep learning framework is best fit for purpose. In the proprietary approach, what will your exit strategy be? Neither should be approached with a short-term view, as A.I. payoff is in the maturity of its learning capabilities.
Mitch DeFelice started his career off serving six years in the U.S. Navy as part of the Naval Security Group tactical electronic support staff. Mitch’s military tours included serving with Fleet Air Reconnaissance Squadron (VQ-1) in Guam and support staff for Admiral Thomas B. Hayward, Commander-in-Chief, U.S. Pacific Fleet (CINPACFLT), Honolulu, Hawaii.
Mitch is a TOGAF 9 Enterprise Architect Certified. Mitch’s primarily focus is working with key business stakeholders and technology executive leadership developing technology solutions that support unstructured data. This includes areas of content management, records management, enterprise search and eDiscovery solutions. His passion lies with developing business solutions around Cognitive Computing capabilities.
Mitch is a frequent contributing author to trade magazines on unstructured data and cognitive computing related topics.
The opinions expressed in this blog are those of Mitch DeFelice and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.