Deep Learning in Applied Visual Systems

Artificial neural networks – core technology at image recognition.

Artificial Neural Networks (ANN) are a key piece of BitRefine Heads platform. In view of the recent move of neural networks from research papers to commercial products, here we’d like to share some key points of the technology.

Traditional image recognition methods clearly have hit the limits. In fact, it is impossible to hard-code all the parameters describing complex object, such as human or dog to make computer reliably recognize it despite of their poses or view angles in constantly changing real-world environment. In the paradigm of artificial neural networks that emulate human brain, developer doesn’t hard-code dog’s or human’s features, but, so called, “train” system to recognize a target object.

Visual Cortex

What is artificial neuron?

In its essence, artificial neuron is a mathematical function that takes one or several values, sums them, multiplies by a weight value and returns as an output. Each neuron has its own weights, so given same input different neurons return different values.

This simple function becomes incredibly powerful when many of them are connected to each other, so that output from one neuron goes to one or several next neurons. Each neuron always has weight value associated with each inbound connection.

The goal of the training is to update these weight values in such a manner that result value that has gone through hundreds or thousands of neurons becomes close to the target value. For example, we feed an image, which is a matrix of numbers, to the neural network. At the training process we want each of thousands individual weights take such a value so one certain neuron at the output layer has maximal value, if there’s a, say, dog on the image.

What is artificial neuron

What is neural network architecture?

The simplest model of an ANN is the perceptron. It has just two layers – input and output and is used for binary predictions. More sophisticated than the perceptron are multi-layer network architectures. They are also called “Deep Neural Networks” or DNN. These networks contain one or many intermediate layers between input and output – so called “hidden” layers. Modern ANN architectures are very complex and may contain a thousand of “hidden” layers.

The more complex the network, the more capabilities it offers and the greater accuracy it demonstrates. The downside of high complexity is the computational power it requires. Simple network with 30 neuron nodes can be trained on a regular PC. However, in order to find weights for each of millions connections between neurons of big architecture engineers may need months of running GPU-powered clusters.

Neural Network Architecture

What are Convolutional Neural Networks?

In the area of image recognition convolution one of the most important concepts. Convolutional neural networks (CNN) have a different architecture than regular neural networks. Their neurons in one layer do not connect to all the neurons in the previous layer but only to a small region of it. This allows converting image to an abstract feature map. For example a map of object’s edges. With each further convolution layer the representation gets more and more abstract. Edges are converted into number of edge’s features, sharpness of corners and roundness of arcs, then it takes feature from previous feature and so on. Finally convolution network coverts initial bunch of color pixels of an image to a set of high-level abstraction, associated with it.

Lastly, the final output will be reduced to a single vector of probability scores representing target classes.

CNN image recognition

Can we recognize actions using neural networks?

Convolution neural networks work great at extracting objects from static images. There’s however next level task: extract action from video. In other words, computer should tell not just “What objects it sees” in each frame but also recognize “What the objects are doing”. Latest research papers shows promising results in solving this kind of tasks using other neural network architectures.

Today there is a number of well-studied architectures capable working with time series. Recurrent neural network (RNN) and long short-term memory (LSTM) have achieved great success in processing sequential data, such as text analysis and speech recognition. But applying these concepts to video results today in very high complexity and exceptionally high requirements to the computing hardware.

Action recognition

What does “to train neural network” mean?

Building an effective neural network architecture is just one part of the task. In order to make neural model recognize objects it need to be trained. The training process consists of several steps. Firstly, we need to prepare a set of images and mark all the required objects manually. Next, we send these images to the neural network together with marked “right answers” and run special training algorithm. This algorithm will find weights to each of the connection in the neural network so that its output coincides with all given “right answers”. After that neural network is validated on separate image set. If accuracy is within required range, neural network is saved as a model that will be later loaded into BitRefine Heads.

Training process is hard. Both, in terms of labor and in terms of computational resources. Training image set includes from hundreds to thousands of manually annotated images. In order to optimize process of annotating and shorten the time specialists at BitRefine group had to build special tools.

After image set is prepared it is loaded to the training process. On regular PC some networks would require years of running to complete training. Therefore, special GPU-powered clusters are used to reduce computational time to days or weeks.

Train neural network

Do we need GPU?

CPUs and GPUs work in different ways. CPU consists of a few cores and optimized for sequential serial processing. It is designed to maximize the performance of a single task within a job. GPU uses thousands of smaller and more efficient cores for a massively parallel architecture aimed at handling multiple functions at the same time.

As the deep learning algorithms are based on performing big number of simple operation in parallel, GPU architecture shows significant benefit in speed. Therefore, most neural networks are trained on powerful servers with multiple high-end GPUs.

As soon as the neural model is trained and ready to use it can run on less powerful platforms. Simple models can work relatively fast even on mobile devices. However, if the network is complex and speed of recognition needs to be high we will still need a GPU onboard.

GPU for deep learning

Learn more:

Capabilities: What objects we detect

Traditional video analytics and machine vision solutions work well with standard objects and clear backgrounds. For example, if we deal with a machinery part at a clear production line or if there’s a separately standing person – standard detectors allow reliably extracting these objects, counting them and doing other operation. But if this person just sits down, standard detector can’t classify such an object any more. Same situation if there’re overlapping objects moving around. Here BitRefine Heads comes to scene.

Explanation: How BitRefine Heads works

BitRefine Heads platform is based on video processing pipeline and reporting tools. Similar to the pathways in human’s brain visual data goes from image capturing first through primary visual complex. Then it gets into main deep neural network that looks for objects of interest. Then it’s passed to the post-processor that modifies results according to the user’s requirements. Finally, data goes to external systems, to local DB, and can be analyzed through our reporting tool. Each stage of the pipeline is adjusted through the web interface.

Video Recognition Platform: Key Features

Today business-centric video processing has become the norm and video recognition software is used now more widely within organizations around the world. Multipurpose platform BitRefine Heads offers a number of key features that help clients solve their individual business use cases very quickly. Here we present the most significant elements of the platform.

Have questions? Contact us: