Suppose a terrorist holding hostages at a secret location makes a video demanding ransom. Now imagine that law enforcement officials can take that video, process it and run it through a database that pinpoints the precise location where it was shot based on images and sounds in the video. Or perhaps a video containing important clues to a crime at a unknown location is uploaded to the Internet. New software could look at the video to determine the time and place of the crime.
These are just two potential uses of a nascent video-recognition technology in development at the International Computer Science Institute (ICSI) in Berkeley, California. I was lucky enough to attend an open house at ICSI last week, and I got to take a look at some exciting new research.
Researchers at ICSI are currently building a video database by analyzing videos downloaded from Flickr, says Gerald Friedland, who leads ICSI’s multimedia efforts. Data from videos taken at known locations is used to develop profiles of the respective locations. Data may include text data such as location tags, visual cues such as textures and colors, and sounds, such as bird song. The attributes of a test video are then compared against the profiles and its location is estimated.
As more videos with embedded geographical information are downloaded, the researchers will use them to "train" the software to recognize more and more locations.
Unfortunately, only three to five percent of the video uploaded to the Internet contains geographical information that can be used to reveal the locations where it shot, which means it will take a long time to build a database with more than just selected test videos.
Even so, the system is remarkably accurate. By comparing the information in the database to some 5000 "wild, unfiltered" videos, researcher Jaeyoung Choi, who is developing the system, was able to pinpoint the location where 14 percent of the videos were shot to within 10 meters or about 33 feet.
Even more startling is the system’s ability to pinpoint a location by analyzing sounds in a video. It can, for example, "listen" to a train whistle and know that it came from a train passing through Tokyo, says Friedland. And no, that’s not hypothetical. It’s already been done, and the software has been trained to recognize sounds from 32 cities around the world.
The same technology could be applied to photographs, which means that the huge trove of precise geographical data generated by Google Street View could be used to train a system much more extensive than the one currently in use at Berkeley.
I’m aware, of course, that video-recognition technology raises potential fears that Big Brother could find out many more details about us. Many of us also have similar concerns about facial recognition technology. The ICSI researchers are also well aware of the potential dark side of thier current research, and it’s not coincidental that funding for the project came initially from the National Geospatial-Intelligence Agency, says Friedland.
"The world is a very big place and this will never be 100 percent accurate," says Friedland. Some locations, particularly remote, barren areas, may never be charted. Still, the fact that software can now recognize a train whistle in Tokyo is amazing—and a little bit scary.
Image: Courtesy of ICSI.