In the era of Instagram and Snapchat, clicking a great photo is not enough. To make the likes flow in, it also needs a fitting caption. Now, with the help of Google’s TensorFlow, thinking hard on captions will be a thing of the past.
The pesky first world problem of coming up with a caption has a solution. Machine learning has come to rescue as the image captioning model in Google’s TensorFlow can now study your image and come up with a description on its own.
And it’s nothing less than amazing. Google calls the project ‘Show and Tell’ and it claims the technology has a 93.9 per cent accuracy rate. Earlier versions could only clock 89.6 per cent and 91.8 per cent accuracy. Even though the increment in accuracy seems minimal, for a form of classification, it does have a large impact on usability.
To attain such accuracy, the scientists at Google Brain Team had to train both the vision and language frameworks with captions created by real people. This enables the technology to avoid simply naming the objects in the image.
Instead, the framework can generate a complete descriptive sentence about the image. The key is to take into account the way objects in a image relate to each other.
While earlier versions took at least three seconds per task on a Nvidia G20 CPU, the current version open sourced today can do the same in just 0.7 seconds. This indicates that this version is much more sophisticated than the previous ones.
The key strength of this framework is its ability to connect objects in an image with a context. This will come useful when the system needs to recognize the scene and differentiate from different scenarios.