**VILSS: Human Action Recognition and Detection from Noisy 3D Skeleton Data - 20,06,2016****Mohamed Hussein,***Egypt-Japan University of Science and Technology*Human action recognition and human action detection are two closely related problems. In human action recogniton, the purpose is to determine the class of an action performed by a human subject from spatio-temporal measurements of the subject, which are cropped in the time dimension to include only the performed action. On the other hand, in human action detection, the input is not cropped in time and may include multiple action instances, possibly from different classes, and the purpose is to determine the action class and the time period of each action instance in the input sequence. Recent years have witnessed a surge in research efforts on the two problems when the measurements are noisy 3D skeleton data, obtained from cheap consumer-level depth sensors, such as the Microsoft Kinect. In this talk, I will present our efforts in this domain. I will first describe our earlier work in designing fixed-length descriptors for human action recognition from 3D skeleton data. Then, I will introduce a direct deployment of these techniques on human action detection via multi-scale sliding window search, which works in real-time, but, can only process sequences off-line. Then, I will explain our most recent results on real-time online human action detection using a simple linear-time greedy search strategy that we call ‘Efficient Linear Search’, which overcomes the limitations of a more sophisticated dynamic programming strategy in this problem.

**VILSS: Upper body pose estimation for sign language and gesture recognition - 17,06,2016****James Charles,***University of Leeds*In this talk I present methods for estimating the upper body pose of people performing gestures and sign language in long video sequences. Our methods are based on random forests classifiers and regressors which have proved successful for inferring pose from depth data (Kinect). Here, I will show how we develop methods to: (1) achieve real-time 2D upper body pose estimation without depth data, (2) produce structured pose output from a mixture of random forest experts, (3) use more image context while keeping the learning problem tractable and (4) incorporate temporal context using dense optical flow.

**VILSS: Intelligent signal processing and learning in imaging - 17,06,2016****Panagiotis Tsakalides,***University of Crete*Modern technologies, including the proliferation of high performance sensors and network connectivity, have revolutionized imaging systems used in applications ranging from medical and astronomical imaging to consumer photography. These applications demand even higher speed, scale, and resolution, which are typically limited by specific imaging and processing components. While striving for more complex and expensive hardware is one path, an alternative approach involves the intelligent design of architectures that capitalize on the advances in cutting edge signal processing to achieve these goals. This talk will motivate the need for smart integration of hardware components, on one hand, and software based recovery, on the other. The talk will showcase the benefits that stem from Compressed Sensing and Matrix Completion, two paradigm shifting frameworks in signal processing and learning, in two imaging problems, namely, range imaging and hyperspectral imaging. Challenges associated with properties of imaging data such as complexity and volume will also be presented and viewed under the prism of these algorithms.

**VILSS: Are Cars Just 3D Boxes? Jointly Estimating the 3D Shape of Multiple Objects - 17,06,2016****Zeeshan Zia,***Imperial College London*Current systems for scene understanding typically represent objects as 2D or 3D bounding boxes. While these representations have proven robust in a variety of applications, they provide only coarse approximations to the true 2D and 3D extent of objects. As a result, object-object interactions, such as occlusions or supporting-plane contact, can be represented only superficially. We approach the problem of scene understanding from the perspective of 3D shape modeling, and design a 3D scene representation that reasons jointly about the 3D shape of multiple objects. This representation allows expressing 3D geometry and occlusion on the fine detail level of individual vertices of 3D wireframe models, and makes it possible to treat dependencies between objects, such as occlusion reasoning, in a deterministic way. The talk will further describe experiments which demonstrate the benefit of jointly estimating the 3D shape of multiple objects in a scene over working with coarse boxes.

**VILSS: Wirewax – engineering vision algorithms for the wild - 17,06,2016****John Greenall,***Wirewax, London*Wirewax is a platform for turning your videos into rich interactive experiences. Backing the website is a powerful suite of computer vision algorithms that run on a scalable cloud architecture. This talk will detail some of the experiences of training and deploying algorithms for use “in the wild”, including discussion of face detection, recognition and motion tracking.

**VILSS: 2D Pairwise Geometry for Robust and Scalable Place Recognition - 17,06,2016****Edward Jones,***Dyson Research Lab, Imperial College London*In this talk, I will present an overview of my PhD research on extending recent trends in visual place recognition to offer robustness and scalability. The underlying theme of my work is the exploitation of 2D geometry between pairs of local image features, which is often overlooked in favour of stronger 3D constraints. I will show how 2D geometry can be effectively applied to complement the limitations of 3D geometry, or even replace it at a fraction of the computational cost. The talk is divided into three sections, each discussing one example of such a method. First, I will show how 2D pairwise geometry can help to eliminate false positive feature correspondences which arise from RANSAC-based 3D geometric constraints. Then, an inverted index consisting of pairwise geometries will be introduced, which makes scalable recognition with geometry possible. Finally, I will introduce a topological robot localisation system which aims towards encoding probability into place recognition attempts, and hence offering suitability to visual SLAM frameworks.

**VILSS: Transductive Transfer Learning for Computer Vision - 17,06,2016****Teo de Campos,***University of Surrey*One of the ultimate goals of the open ended learning systems is to take advantage of previous experience in dealing with future problems. We focus on classification problems where labelled samples are available in a known problem (the source domain), but when the system is deployed in the target dataset, the distribution of samples is different. Although the number of classes and the feature extraction method remain the same, a change of domain happens because there is a difference between the typical distribution of data of source and target samples. This is a very common situation in computer vision applications, e.g., when a synthetic dataset is used for training but the system is applied on images “in the wild”. We assume that a set of unlabelled samples is available target domain. This constitutes a Transductive Transfer Learning problem, also known as Unsupervised Domain Adaptation. We proposed to tackle this problem by adapting the feature space of the source domain samples, so that their distribution becomes more similar to that of the target domain samples. Therefore a classifier re-trained on the updated source space can give better results on the target samples. We proposed to use a pipeline which consists of three main components: (i) a method for global adaptation of the marginal distribution of the data using Maximum Mean Discrepancy; (ii) a sample-based adaptation method, which translates each source sample towards the distribution of the target samples; (iii) a class-based conditional distribution adaptation method. We conducted experiments on a range of image classification and action recognition datasets and showed that our method gives state-of-the-art results.

**VILSS: Ortho-diffusion decompositions of graph-based representations of images - 17,06,2016****Adrian Bors,***University of York*In this presentation I introduce the ortho-diffusion operator. I consider graph-based data representations where full data interconnectivity is modelled using probability transition matrices. Multi-scale dimensionality reduction at different scales is used in order to extract the meaningful data representations. The QR orthonormal decomposition algorithm, alternating with diffusion and data reduction stages is applied recursively at each scale level for the given data representation. Columns in the ortho-diffusion representation matrix represent characteristic features of the data. Those columns that are not considered essential for the data representation are removed at each scale. The proposed methodology is used to model features extracted from images which are then used for image matching and face recognition. Image matching is applied to optical flow estimation from image sequences. For the face recognition application I consider both global appearance models, based on either the correlation or the covariance of training sets, as well as semantic representations of biometric features. The proposed methodology is shown to be robust in face classification applications when considering image corruption by various noise statistics.

**VILSS: Ultrasound imaging and inverse problems - 17,06,2016****Denis Kouame,***Universite Paul Sabatier Toulouse*Among all the medical imaging modalities, ultrasound imaging is the most widely used, due to its safety, cost-effectiveness, flexibility and real-time nature. However, compared to other medical imaging modalities such as Magnetic Resonance Imaging (MRI), or Computed Tomography (CT), ultrasound images suffers from the presence of speckle and have low-resolution in most standard applications. Although most manufacturers of ultrasound scanners have developed many device-based-routines in order to overcome these issues, many challenges in terms of signal and image processing remain. In this tutorial, we will review the basics and advanced ultrasound imaging, then we will focus on the current signal and image processing challenges, and show some recent results.

**VILSS: Global description of images. Application to robot mapping and localisation - 15,06,2016****Luis Payá,***Miguel Hernández University, Spain*Nowadays, the design of fully autonomous mobile robots is a key discipline. Building a robust model of the unknown environment is an important ability the robot must develop. Using this model, the robot must be able to estimate its current position and to navigate to the target points. The use of omnidirectional vision sensors is usual to solve these tasks. When using this source of information, the robot must extract relevant information from the scenes both to build the model and to estimate its position. The possible frameworks include the classical approach of extracting and describing local features or working with the global appearance of the scenes, which has emerged as a conceptually simple and robust solution. In this talk, the role of global-appearance techniques in robot mapping and localization is analysed.

**VILSS: Joint Tracking and Event Analysis for Carried Object Detection - 17,11,2015****Aryana Tavanai,***University of Leeds*Tracking and Event Analysis are areas of video analysis which have great importance in robotics applications and automated surveillance. Although they have been greatly studied individually, there has been little work on performing them jointly where they mutually influence and improve each other. In this talk I will present our novel approach for jointly estimating the track of a moving object and recognising the events in which it participates. First, I will introduce our geometric carried object detector. Then I will present our tracklet building approach which enforces spatial consistency between the carried objects and other pre-tracked entities in the scene. Finally, I will present our joint tracking and event analysis framework posed as maximisation of a posterior probability defined over event sequences and temporally-disjoint subsets of tracklets. We evaluate our approach using tracklets from three state of the art trackers and demonstrate improved tracking performance in each case, as a result of jointly incorporating events, while also subsequently improving event recognition.

**VILSS: Discriminative Feature Learning for Large-scale Data - 09,11,2015****Mengyang Yu,***Northumbria University*Computation on large-scale data spaces has been involved in many active problems in computer vision and pattern recognition. However, in realistic applications, most existing algorithms are heavily restricted by the huge number and the high dimension of feature descriptors in data spaces. Generally speaking, there are two main ways to speed up the algorithms: (1) projecting features onto a lower-dimensional subspace; (2) embedding features into a Hamming space. In this talk, I will present our recent work on the dimensionality reduction and the binarization of features for various applications. First, I will show a novel subspace learning algorithm which realizes the discriminant analysis for large-scale local feature descriptors, and a generalized orthogonalization method leading to a more compact and less redundant subspace. Next, local feature based hashing for similarity search will be introduced. Most existing hashing methods for image search and retrieval are based on global representations, e.g., Fisher vectors and VLAD, which lack the analysis of the intrinsic geometric property of local features and heavily limit the effectiveness of the hash code. Finally, I will present how to efficiently reduce very high-dimensional representations to medium-dimensional binary codes with a small memory cost and the low coding complexity.