In the field of artificial intelligence, there are two main approaches to training a programme: symbolic AI, which uses visual symbols, and connectionism, which uses connected information to form artificial neural networks.
Until now, these two methods have been used independently from one another, but researchers from MIT, IBM, and DeepMind have combined these two approaches to create the neuro-symbolic learner (NS-CL), which can learn “visual concepts, words, and semantic parsing of sentences” in a similar way to how a child would.
The program is based on the idea that humans learn visual concepts by understanding both vision and language.
According to MIT Technology Review, the program is made up of two neural networks that are trained in different ways. One neural network is trained on a series of scenes of a number of objects, in this case a number of 3D coloured shapes, while the other is trained on a series of text-based question and answers pairs about the scene. The two approaches are combined, meaning that the system is able to match the written questions to the visual scene it is presented with.
MIT: Robots of the future could learn “on the fly”
In other words, this network learns to map the natural language questions and use this to interpret a scene.
For example, the program is given questions such as “How many objects are right of the red object?” or “How many objects have the same material as the cube?” along with answers. It can then apply the information learnt to the scene, using both approaches to “jointly and incrementally” learn visual concepts with accuracy.
This addresses the shortcomings of both approaches, with neural networks requiring a large volume of data, and the issue of scalability associated with symbolism.
According to researchers this could lead to “meaningful future work toward robotic learning in complex interactive environments” as using both approaches at once means that less training data is needed. According to MIT, in the future robot systems could learn “on the fly”, rather than spend significant time training for each unique environment they’re in, making the process of training a system far simpler.