BVISS: Action localization without spatiotemporal supervision

Dr Cees Snoek – Faculty of ScienceUniversity of Amsterdam

Abstract

Understanding what activity is happening where and when in video content is crucial for video computing, communication and intelligence. In the literature, the common tactic for action localization is to learn a deep classifier on hard to obtain spatiotemporal annotations and to apply it at test time on an exhaustive set of spatiotemporal candidate locations. Annotating the spatiotemporal extent of an action in training video is not only cumbersome, tedious, and error prone, it also does not scale beyond a hand full of action categories. In this presentation, I will highlight recent work from my team at the University of Amsterdam in addressing the challenging problem of action localization in video without the need for spatiotemporal supervision. We consider three possible solution paths: 1) the first relies on intuitive user-interaction with points, 2) the second infers the relevant spatiotemporal location from an action class label, and finally, 3) the third derives a spatiotemporal action location from off-the-shelf object detectors and text corpora only. I will discuss the benefit and drawbacks of these three solutions on common action localization datasets, compare with alternatives depending on spatiotemporal supervision, and highlight the potential for future work.

Biography

 

Cees Snoek received the M.Sc. degree in business information systems in 2000 and the Ph.D. degree in computer science in 2005, both from the University of Amsterdam, The Netherlands. He is currently a director of the QUVA Lab, the joint research lab of Qualcomm and the University of Amsterdam, on deep learning and computer vision. He is also a principal engineer/manager at Qualcomm Research Netherlands and an associate professor at the University of Amsterdam. His research interests focus on video and image recognition. He is recipient of a Veni Talent Award, a Fulbright Junior Scholarship, a Vidi Talent Award, and The Netherlands Prize for Computer Science Research.

http://www.uva.nl/en/profile/s/n/c.g.m.snoek/c.g.m.snoek.html

BVISS: Learning to synthesize signals and images

Dr Sotirios Tsaftaris – School of Engineering, Edinburgh University

Abstract:   An increasing population and climate change put pressure on several societally important domains. Health costs are increasing and at the same time feeding the world becomes a challenge. Imaging (and sensing) is central to furthering our understanding of biology not

 

only in its diagnostic capacity but also in phenotyping variation. This opens the need for several analysis tasks of detection, segmentation, classification etc on the basis of static or dynamic imaging data.  Evaluating and designing algorithms that address these tasks relies heavily on real annotated data, of sufficient quality and quantity.  Synthetically generating data with ground truth can be a useful alternative. In this seminar I will motivate this using two application domains: medical imaging and plant phenotyping.  I will present solutions that learn in data-driven fashion data distributions and mappings that generate or synthesize data using dictionaries or deep neural networks. Our approaches use structured learning and multiple modalities to learn representations of desirable invariance (and covariance).  Problems of crossmodal synthesis in MRI and CT are presented as well as the ability to conditionally generate images of plants with specific topological arrangement.

Bio: Dr. Sotirios A. Tsaftaris, obtained his PhD and MSc degrees in Electrical Engineering and Computer Science (EECS) from Northwestern University, USA in 2006 and 2003 respectively. He obtained his Diploma in Electrical and Computer Engineering from the Aristotle University of Thessaloniki, Greece.  Currently, he is a Chancellor’s Fellow (Senior Lecturer grade) in the School of Engineering at the University of Edinburgh (UK). He is also a Turing Fellow with the Alan Turing Institute.

From 2006 to 2011, he was a research assistant professor with the Departments of EECS and Radiology, Northwestern University, USA.  From 2011-2015, he was with IMT Institute for Advanced Studies, Lucca serving as Director of the Pattern Recognition and Image Analysis Unit.

He is an Associate Editor for the IEEE Journal of Biomedical and Health Informatics and for Digital Signal Processing – Journal (Elsevier). He has organized specialized workshops at ECCV (2014), BMVC (2015), ICCV (2017) and MICCAI (2016,2017), and served as Area Chair for IEEE ICCV (2017) and VCIP (2015). He has also served as guest editor (Machine Vision and Applications; IEEE Transactions on Medical Imaging; and Digital Signal Processing – Software X).

He has received twice the Magna Cum Laude Award by the International Society for Magnetic Resonance in Medicine (ISMRM) in 2012 and 2014, and was a finalist for the Early Career Award, from the Society for Cardiovascular Magnetic Resonance (SCMR) in 2011.

He has authored more than 100 journal and conference papers particularly in interdisciplinary fields and his work is (or has been) supported by the National Institutes of Health (USA), EPSRC & BBSRC (UK), the European Union, the Italian Government, and several non-profits and industrial partners.

His research interests are in machine learning, image analysis (medical image computing), image processing, and distributed computing.

Dr. Tsaftaris is a Murphy, Onassis, and Marie Curie Fellow. He is also member of IEEE, ISMRM, SCMR, and IAPR.

Additional information:

http://tsaftaris.com or https://www.eng.ed.ac.uk/about/people/dr-sotirios-tsaftaris

BVISS: Augmenting vision, the easy and the hard way

Dr Stephen Hicks – Oxford University – Research Fellow in Neuroscience and Visual Prosthetics, Nuffield Department of Clinical Neurosciences

Mobile computing, augmented reality, deep learning. Consumer-grade devices are coming of age with a dazzling array of technologies and potentials. While tech giants search for killer apps, there are sectors of society who have well defined needs that could be met with aspects of these technologies. In many high profile cases, people with sensory or motor deficits have pioneered the use of mobile augmenting technologies that the rest of us are only just becoming aware of. Bionic limbs, cochlear implants and retinal prosthetics have moved from the highly experimental into the FDA approved. The goal of my work has been to develop low-cost and non-invasive vision enhancement systems that not only provide function benefits to those with poor sight, but that are also good looking enough to break through a social barrier often raised against enabling technologies. In my talk I will take an overview of relevant vision enhancement technologies and my groups work developing and validating smart glasses that not only boost an image, but can also provide autonomous and semi intelligent descriptions of the world using machine learning.

Biography

Stephen wears two hats that look quite similar. He is a Research Lecturer in neuroscience and visual prosthetics at the University of Oxford and runs a small team developing and testing wearable displays to boost vision for people with severe visual impairments. He is also a Founding Director and Technical Lead at a london-based startup called OxSight where he manages a small team developing commercially feasible smart glasses to boost vision and quality of life of blind and partially sighted people. Stephen has a multidisciplinary approach that combines machine learning and computer vision with novel cameras and displays to form images that are easy to see and understand for people with poor vision. He was an Enterprise Fellow of the Royal Academy of Engineering and received early career awards such as the Royal Society Brian Mercer Award for Innovation and lead the team that won the 2014 Google Global Impact Challenge. He resides in London and desperately holds onto his Australian accent.

BVI Seminar: Attentional selection of colour is determined by both cone-based and hue-based representations

Jasna Martinovic, University of Aberdeen

Jasna Martinovic What is the nature of representations that sustain attention to colour? In other words, is attention to colour predominantly determined by the low-level, cone-opponent chromatic mechanisms established at subcortical processing stages, or by the multiple narrowly-tuned higher-level chromatic mechanisms established in the cortex? These questions remain unresolved in spite of decades of research. In an attempt to address this problem, we conducted a series of electroencephalographic (EEG) studies that examined cone-opponent and hue-based contributions to colour selection. We used a feature-based attention paradigm, in which spatially overlapping, flickering random dot kinematograms (RDKs) of different colours are presented and participants are asked to selectively attend to a colour in order to detect brief, coherent motion intervals, ignoring any such events in the unattended colours. Each flickering colour drives a separate steady-state visual evoked potential (SSVEP), a response whose amplitude increases when that colour is attended. In our studies, behavioural performance and SSVEPs are thus taken as indicators of selective attention. The first study demonstrated that at least some of the observed cone-opponent attentional effects can be explained by asymmetric summation of signals from different cone-opponent channels with luminance at early cortical sites (V1-V3). This indicates that there might be cone-mechanism-specific optimal contrast ranges for combining colour and luminance signals. The second study demonstrated that hue-based contributions can also be observed concurrently with the aforementioned low-level, cone-opponent effects. Proximity and linear separability of targets and distractors in a hue-based colour space was shown to be another determinant of effective colour selection. In conclusion, attention to colour should be examined across the full range of chromoluminance space, task space and dependent measure space. Current evidence indicates that multiple representations contribute to selection of colour, and that depending on the stimulus attributes, task demands, and the attributes of the applied measures, it is possible to observe a spectrum of effects ranging from purely cone-opponent to largely hue-based.

Biography:
Dr Jasna Martinovic is a senior lecturer at the University of Aberdeen. She completed her PhD at the University of Leipzig, followed by postdoctoral work at the University of Liverpool. Her research investigates how colour and luminance signals feed into mid and higher-level stages of visual perception, as well as how they are sampled by visual attention.

BVI Seminar: Eye Movements in Low and Normally Sighted Vision

Brian SullivanUniversity of Bristol, School of Experimental Psychology

Dr Brian Sullivan I will present two studies examining human eye movements and discuss my role at the University of Bristol. The first study concerns patients with central vision loss who often adopt a preferred retinal locus (PRL), a region in peripheral vision used for fixation as alternative to the damaged fovea. A common clinical approach to assess the PRL is to record monocular fixation behavior of a small stimulus using a scanning laser ophthalmoscope. Using a combination of visual field tests and eye tracking, we tested how well the ‘fixational PRL’ generalizes to PRL use during a pointing task. Our results suggest that measures of the fixational PRL do not sufficiently capture the PRL variation exhibited while pointing, and can inform patient therapy and future research. In the second study, eye movements from eight participants were recorded with a mobile eye tracker. Participants performed five everyday tasks: Making a sandwich, transcribing a document, walking in an office and a city street, and playing catch with a flying disc. Using only saccadic direction and amplitude time series data, we trained a hidden Markov model for each task and were then able to classify unlabeled data. Lastly, I will briefly describe my role in the GLANCE Project at the University of Bristol. We are an interdisciplinary group in the departments of Experimental Psychology and Computer Science, sponsored by the EPSRC to make a wearable assistive device that would monitor behavior in tasks and present short video clips to guide behavior in real-time