VILSS: How to tag billions of photos: The evolution of image auto-tagging from a technology to a global service​

Dr Stavri Nikolov, Imagga Technologies, Co-fouder and Research Director 

Imagga (https://imagga.com/) is one of the pioneers in the world in large-scale image tagging. Our cloud and on-premise software solutions have analysed and tagged billions of photos of various clients from around the world, ranging from telcos and cloud service providers to digital media companies, stock photo agencies, media sharing platforms, real estate and advertising agencies and others. In this talk we shall present an overview of how in the last decade image auto-tagging evolved from a technology to a global service and share our views how it may develop in the future. We shall discuss the challenges, the client needs, and the solutions that have evolved, and showcase some interesting applications of image tagging and categorisation.

Biography

Dr Stavri Nikolov is co-founder and Research Director of Imagga Technologies Ltd (www.imagga.com). Imagga Technologies Ltd is a company that develops and offers technologies, services, and online tools, for large-scale image analysis, recognition and tagging in the cloud and on premise. Dr Nikolov is also Founding Director of Digital Spaces Living Lab (DSLL) in Sofia, Bulgaria. DSLL (www.digitalspaces.info) is one of the leading Living Labs in Europe and a member of the European Network of Living Labs (ENoLL). DSLL develops and tests new technologies, services and apps for digital media, wearables, lifelogging, museums and smart cities. In the past, Dr Nikolov was also a Senior Scientist (Digital Identity and Information Search) at the Institute for Prospective Technological Studies (http://ipts.jrc.ec.europa.eu) of the European Commission in Seville, Spain (2009-2011), and a Senior Research Fellow in Image Processing at the University of Bristol, UK (1998-2007). His research interests over the years have spanned areas such as image analysis, image recognition, image fusion, image search, mobile search, new methods for data visualisation and navigation, gaze-tracking, HCI, VR, the construction of attentive and interactive information displays, video surveillance, digital identity and biometrics, location-based services, and new technologies for CH in museums and galleries. In the last 20 years he has coordinated and participated in many large international and national research projects in Austria, Portugal, UK, Spain and Bulgaria. He has published more than 80 refereed or invited papers, including eight invited book chapters, and also numerous technical reports in these areas. Dr Nikolov has also given many invited lectures around the world. He was the creator and co-ordinator of The Online Resource for Research in Image Fusion (www.imagefusion.org) and The Online Archive of Scanpath Data (www.scanpaths.org). Dr Nikolov has been a member of the British Machine Vision Association, the Applied Vision Association, the International Society of Information Fusion, ACM SIGGRAPH, and IEEE. He served as a member of the Editorial Board of the Information Fusion journal published by Elsevier for nearly 10 years. Over the years, Dr Nikolov has been a technology mentor to various organisations, including Eleven (www.eleven.bg), a €12M Acceleration Fund, LAUNCHub (www.launchub.com), a €9M Seed and Acceleration Fund, the European Business Network (http://ebn.be), and the European Satellite Navigation Competition (http://www.esnc.info/). He was the founding director of Smart Fab Lab (https://www.facebook.com/smartfablab) – the first fab lab (digital fabrication lab) in Bulgaria. Dr Nikolov was Working Group 5 (Industry and End Users) Leader and a Management Committee member of the European Network on Integrating Vision and Language (iV&L Net) COST Action (http://www.cost.eu/COST_Actions/ict/Actions/IC1307) in the first two years of the network.

VILSS: Human Action Recognition and Detection from Noisy 3D Skeleton Data

Mohamed Hussein, Egypt-Japan University of Science and Technology

profile3 Human action recognition and human action detection are two closely related problems. In human action recogniton, the purpose is to determine the class of an action performed by a human subject from spatio-temporal measurements of the subject, which are cropped in the time dimension to include only the performed action. On the other hand, in human action detection, the input is not cropped in time and may include multiple action instances, possibly from different classes, and the purpose is to determine the action class and the time period of each action instance in the input sequence. Recent years have witnessed a surge in research efforts on the two problems when the measurements are noisy 3D skeleton data, obtained from cheap consumer-level depth sensors, such as the Microsoft Kinect. In this talk, I will present our efforts in this domain. I will first describe our earlier work in designing fixed-length descriptors for human action recognition from 3D skeleton data. Then, I will introduce a direct deployment of these techniques on human action detection via multi-scale sliding window search, which works in real-time, but, can only process sequences off-line. Then, I will explain our most recent results on real-time online human action detection using a simple linear-time greedy search strategy that we call ‘Efficient Linear Search’, which overcomes the limitations of a more sophisticated dynamic programming strategy in this problem.

 
         
 

VILSS: Upper body pose estimation for sign language and gesture recognition

James Charles, University of Leeds

profile3In this talk I present methods for estimating the upper body pose of people performing gestures and sign language in long video sequences. Our methods are based on random forests classifiers and regressors which have proved successful for inferring pose from depth data (Kinect). Here, I will show how we develop methods to: (1) achieve real-time 2D upper body pose estimation without depth data, (2) produce structured pose output from a mixture of random forest experts, (3) use more image context while keeping the learning problem tractable and (4) incorporate temporal context using dense optical flow.

 

 

VILSS: Intelligent signal processing and learning in imaging

Panagiotis Tsakalides, University of Crete

Panagiotis TsakalidesModern technologies, including the proliferation of high performance sensors and network connectivity, have revolutionized imaging systems used in applications ranging from medical and astronomical imaging to consumer photography. These applications demand even higher speed, scale, and resolution, which are typically limited by specific imaging and processing components. While striving for more complex and expensive hardware is one path, an alternative approach involves the intelligent design of architectures that capitalize on the advances in cutting edge signal processing to achieve these goals. This talk will motivate the need for smart integration of hardware components, on one hand, and software based recovery, on the other. The talk will showcase the benefits that stem from Compressed Sensing and Matrix Completion, two paradigm shifting frameworks in signal processing and learning, in two imaging problems, namely, range imaging and hyperspectral imaging. Challenges associated with properties of imaging data such as complexity and volume will also be presented and viewed under the prism of these algorithms.

VILSS: Are Cars Just 3D Boxes? Jointly Estimating the 3D Shape of Multiple Objects

Zeeshan Zia, Imperial College London

Zeeshan ZiaCurrent systems for scene understanding typically represent objects as 2D or 3D bounding boxes. While these representations have proven robust in a variety of applications, they provide only coarse approximations to the true 2D and 3D extent of objects. As a result, object-object interactions, such as occlusions or supporting-plane contact, can be represented only superficially. We approach the problem of scene understanding from the perspective of 3D shape modeling, and design a 3D scene representation that reasons jointly about the 3D shape of multiple objects. This representation allows expressing 3D geometry and occlusion on the fine detail level of individual vertices of 3D wireframe models, and makes it possible to treat dependencies between objects, such as occlusion reasoning, in a deterministic way. The talk will further describe experiments which demonstrate the benefit of jointly estimating the 3D shape of multiple objects in a scene over working with coarse boxes.

VILSS: Wirewax – engineering vision algorithms for the wild

John Greenall, Wirewax, London

Wirewax is a platform for turning your videos into rich interactive experiences. Backing the website is a powerful suite of computer vision algorithms that run on a scalable cloud architecture. This talk will detail some of the experiences of training and deploying algorithms for use “in the wild”, including discussion of face detection, recognition and motion tracking.