What if an artificial intelligence could automatically generate a caption for an image, describing its content? Microsoft researchers have looked into the problem and just announced huge progress in image analysis with the Azure AI system.
The video below gives some examples of the evolution of algorithms since 2015, resulting in captions as precise as what a human being could describe:
Among the examples of the video that shows the evolution of legends:
- “View of the street of a city” becomes “A tram in the street of a city”
- “A pendulum on the side of a building” becomes “A statue on the top of a building”
- “A group of baseball players on a field with grass” becomes “A group of football players celebrating their victory”
- “A close-up cat” becomes “A gray cat with its eyes closed”
- “A green door close-up” becomes “A white switch on a green wall”
- “Close-up food” becomes “A pile of coffee beans”
To achieve this result, Microsoft teams trained their AI with a large amount of images. They ‘taught’ the system words corresponding to specific objects in the pictures. Then, with this vocabulary, the AI learned how to form sentences.
The system was then subjected to the nocaps test, which evaluates the AI by offering it images that it does not know. In the end, artificial intelligence has generated more accurate and better legends than those created by humans.
This system for automatically creating a caption for an image will be useful for the visually impaired and the blind, for example. It can also make it possible to search for an image more quickly by specifying its content. Microsoft plans to integrate the feature later this year into its Word, Outlook and PowerPoint software.
Source : Microsoft