An increasing number of recorded videos are being tagged with geographic properties of the camera scenes. This meta-data is of significant use for storing, indexing and searching large collections of videos. Our system implementation demonstrates a prototype of a georeferenced video search portal ( that utilizes an estimation model of a camera’s viewable scene for efficient spatio-temporal video search.


  • The GeoVid project explores the concept of sensor-rich video tagging. Specifically, recorded videos are tagged with a continuous stream of extended geographic properties that relate to the camera scenes.
  • This meta-data is then utilized for storing, indexing and searching large collections of community-generated videos. By considering video related meta-information, more relevant search results can be returned and advanced searches, such as directional and surround queries, can be executed.
  • To acquire video with the relevant properties, we provide smartphone and tablet apps that automatically annotate captured videos with their respective field-of-views (FOV).




We use a camera field of view (FOV) model to represent the area that is captured from the sensor-rich mobile devices.

Data Acquisition from Mobile Services
Sensor (meta) data

  • Video: H.264/AVC 720x480 resolution
  • Location: latitude/longitude from GPS
  • Direction: compass
  • Misc: WIFI fingerprints, light level, etc.

Geovid Web Services

  • Spatio-temporal indexing of FOVs
  • Video transcoding
  • On-demand image extraction
  • Sensor data mining
  • Automated tagging from 2D/3D models

Platform-agnostic Search and Browsing
Enabling various sensor-rich experiences

  • Synchronised video and trajectory rendering
  • Visibility-aware keyword search
  • Video summary from estimated popular places



  • The proposed model and the implemented system are employed by the MediaQ platform in collaboration with the Integrated Media Systems Center (IMSC) from USC’s Viterbi School of Engineering.
  • Our system has also been utilized by the School of Media Arts, Columbia College Chicago during the NATO Summit 2012 in Chicago and during President Obama’s 2nd Inauguration.

Potential Benefits

  • Location-aware contextual advertisement
  • Accurate labeling for Augmented Reality applications 

Future Direction

  • Video Sub-shot motion characterization from sensor data analysis
  • Multi-video summarization and skim generation 
  • DASH-compatible efficient uploading and adaptive streaming
  • Computer vision assisted Location correction 



Research Team
PI (Faculty): Roger Zimmermann
Co-PI (Faculty): Seon Ho Kim (Integrated Media Systems Center)
Jia Hao, Guanfeng Wang, He Ma, Ying Zhang, Yifang Yin, Luming Zhang

Sakire Arslan Ay, Beomjoo Seo, Min Min Htoon, Lingyan Zhang, Shunkai Fang, Weiwei Cui, Zhijie Shen



Roger Zimmermann

School of Computing 
Department of Computer Science 
National University of Singapore 
Computing 1 
13 Computing Drive 
Singapore 117417 

Click to visit: GeoVid website