Skip to main content

Research Seminars 2016

Research Seminars 2016

Recent Advances in Search Based Software Testing and Genetic Improvement

Search-based approaches to software testing formulate testing as an optimisation problem, which can be attacked using computational search techniques from the field of Search Based Software Engineering (SBSE). Test objectives find natural counterparts as the fitness functions used by SBSE to guide automated search, thereby facilitating SBSE formulations of many and diverse testing problems. This talk, which is an updated version of Harman's keynote at ICST 2015, will review achievements in Search Based Software Testing and will explore the question: "what could we do with software if software testing were to prove a sufficiently practical system equivalence relation?". Recent work that makes these kinds of assumption has produced breakthroughs in genetic improvement and program transplantation.

Swarm Music


Swarm Music is a realisation of a Live Algorithm, that is to say, an autonomous artificial improvisor that performs with people. Inspired by the dynamic patterning of animal swarms, Swarm Music generates free flowing improvisations from the motion of particles in a music representation space. The particles - parameterisations of musical events - interact with each other and with a swarm of musical events emanating from other improvisers. The observation that music is patterning in time, and that improvised music is self-organising, provide a context for Swarm Music, and for Live Algorithms in general. The talk will illustrate these ideas with visual patterns, natural and artificial, the sonification of dynamical systems, and with Swarm Music recordings.

Inverse Privacy


Call an item of your personal information inversely private if some party has access to it but you do not. We analyze the provenance of inversely private information and its rise to dominance over other kinds of personal information. The inverse privacy problem is unjustified inaccessibility to you of your inversely private information. Our analysis shows that the inverse privacy problem has a market-based solution. Your inversely private data will be hugely beneficial to you.

This is joint work with Efim Hudis and Jeannette Wing.

Multi-camera, multi-target tracking


I will present in this talk a multi-camera multi-target tracking framework we have developed over the last decade, which is composed of two core components. The first is the "Probabilistic Occupancy Map", to estimate the probabilities of presence of pedestrians in an individual time frame. This procedure relies on a generative model and estimates at every location the posterior probability of presence of a person by minimizing the KL divergence between the resulting product law, and the true posterior under our model. The second is the "Multiple Tracked Paths", to compute a set of trajectories consistent with physical constraints, and optimal given the probabilities of presences estimated in individual time frames. This procedure relies on a model of the target motions as a flow in a graph, and solves a max-cost flow problem, which can be done efficiently and optimally.

This is joint work with (many) people from the CVLab at EPFL.

Understanding Data Effectively with Visual Analytics and Interactive Surfaces


There is a proliferation of data that is being produced everyday from many domains. Too much criminal activity occurs daily and it is hard to determine trends and patterns in the large data sets, however, if we had more effective analysis techniques we could potentially reduce the number of crimes. During an emergency or disaster there is so much data that is being produced it is critical to be able to make informed decisions in a timely matter. People commute using many forms of transport, but if we knew more about how people commuted we could raise awareness to make better decisions around sustainable transportation infrastructure investment. Large amounts of software is continually being engineered but analysing the code to improve the design quality is challenging. Understanding data from these domains for better sense making is a complex process and there currently lacks appropriate tools for collaborative teams to analyse and explore the data. We propose using advanced visual analytics techniques and interactive surfaces to explore data from these domains. In this talk I discuss my experience at designing techniques for visualizing data with large interactive surfaces, visualization walls, and domain specific visual analytics tools.

Specialist experts for Prediction with Side Information


The talk is going to be comprised of
a) a brief introduction to the theory of Prediction with Expert Advice and its techniques
b) one (and, if we have time, two) example of applying these methods to real-world data.
This area of machine learning deals with merging predictions from different sources called experts in this context. I will introduce the basic notions and tools and then will give an overview of the specialist experts scenario. Then I'll show how these techniques could be applied to the task of predicting strudents' performance at tests and (if there's enough time) to the task of predicting implied volatility.

Parallel Metric Tree Embedding based on an Algebraic View on Moore-Bellman-Ford


A metric tree embedding of expected stretch s (randomly) maps a weighted n-node graph G to a weighted tree such that (i) the node set of G is included in the tree, (ii) distances in the tree are at least as large as in G, and (iii) the expected distances in the tree are at most s times larger than in G. Such embeddings are highly useful for designing fast approximation algorithms, as many hard problems are easy to solve on tree instances. We show how to achieve an asymptotically optimal expected stretch of O(log n) by a parallel algorithm of small (i.e., polylogarithmic) depth and near-linear work. Our main tool is an algebraic characterization of a generalization of the classic Moore-Bellman-Ford algorithm. We consider this framework, which subsumes a variety of previous "Moore-Bellman-Ford-flavored" algorithms, to be of independent interest.

Space complexity in Geometric and Algebraic proof systems

Intuitively the space required to prove a tautology (or refute a contradiction) is the amount of information we need to keep simultaneously in memory as we work through the proof and convince ourselves that the original formula is a tautology (or unsatisfiable). We present recent results showing techniques to prove space lower bounds for proof systems based on algebraic reasoning and on integer programming reasoning.

Stamping Out Concurrency Bugs


The shift to multi-core architectures in the past ten years pushed developers to write concurrent software to leverage hardware parallelism. The transition to multi-core hardware happened at a more rapid pace than the evolution of associated programming techniques and tools, which made it difficult to write concurrent programs that are both efficient and correct. Failures due to concurrency bugs are often hard to reproduce and fix, and can cause significant losses.


In this talk, I will first give an overview of the techniques we developed for the detection, root cause diagnosis, and classification of concurrency bugs. Then, I will discuss how the techniques we developed have been adopted at Microsoft and Intel. I will then discuss in detail Gist, a technique for the root cause diagnosis of failures. Gist uses hybrid static-dynamic program analysis and gathers information from real user executions to isolate root causes of failures. Gist is highly accurate and efficient, even for failures that rarely occur in production. Finally, I will close by describing future work I plan to do toward solving the challenges posed to software systems by emerging technology trends.

Robust Algorithms and Models for Wireless Networks


Real environments are notoriously challenging for wireless communication, and future ones will become no easier, with increasing volumes and heterogeneity. Smart algorithms and models are needed that are robust to fading and other environmental effects, arbitrary placement, unreliability and temporal variability, and inter-technology interference. We outline recent progress on adding more realism to algorithmic models and in analyzing algorithms for fundamental scheduling problems. We also identify and discuss some of the challenges ahead.

Reliable region predictions for Automated Valuation Models

Accurate property valuation is important for property purchasers and banks to assess credit risk in the mortgage market. Traditional property valuation using a surveyor is expensive and may not be accurate or entirely objective. Therefore, automated valuation models (AVM) are being developed to provide cheaper, objective valuations that allow dynamic updating of property values over the term of a mortgage. A useful feature of automated valuations is that they should give a range of plausible price estimates for each individual property, rather than just a single point estimate. This would allow mortgage providers to include conservatism in their credit risk assessments. In this study, Conformal Predictors (CP) are used to provide such region predictions, whilst strictly controlling for predictive accuracy. We show how an AVM can be constructed using a CP, based on an underlying k-nearest neighbours approach. We deal with the time trend in house prices by assuming a systematic effect over time and specifying the CP on a latent price variable. The AVM is tested on a large data set of London house prices. We show that the region predictions are reliable and also investigate the efficiency ie region width) of house price predictions. In particular, we consider how the uncertainty in house price prediction is linked to property characteristics.

SCOT modeling, parallel training and statistical inference


Stochastic Context Tree (abbreviated as SCOT) is m-Markov Chain with every state of a string independent of the symbols in its more remote past than the context of length determined by the preceding symbols of this state. SCOT has also appeared in other fields under various names (VLMC, PST, CTW) for compression applications. Consistency of algorithms for training SCOT have been derived for stationary time series with mixing. We survey recent advances in SCOT modeling, parallel training and statistical inference described in chapter 3 of B. Ryabko, J. Astola and M. Malyutov `Compression-Based Methods of Statistic al Analysis and Prediction of Time Series', Springer, which is to appear shortly.

Chupja–PHY Covert Channels: Can you see the Idles?

Network covert timing channels embed secret messages in legitimate packets by modulating interpacket delays. Such channels are normally implemented in higher network layers (layer 3 or above), are often fairly slow, and can be easily detected or prevented. In this talk, I will present a new approach, Chupja (Korean for spy), which is a very effective covert timing channel that works over the Internet. It is implemented in the physical layer of the network stack and is many orders of magnitude faster than prior art while being very robust and virtually invisible to software endhosts. Key to our approach is software and real-time access and control over every bit in the physical layer of a 10 Gigabit network stack (a bit is 100 picoseconds wide at 10 gigabit per seconds), which allows us to modulate and interpret interpacket spacings at sub-microsecond scale. In the talk, I will discuss when and how a timing channel in the physical layer works, how hard it is to detect such a channel, and what is required to do so.

Characterising structural sparseness by neighbourhood complexity


If your neighbourhood is boring you probably live in a sparse region. Structurally sparse graphs always had a special place in algorithm theory: almost any problem becomes much more tractable if we restrict ourselves to nice graph classes like trees, planar graphs, or graphs excluding a minor. On the other hand, algorithms on such classes are only of limited use in real-world settings---hardly any real-world network exhibits these stricter notions of structural sparseness. In this talk we will explore a notion of sparseness (so-called bounded expansion classes introduced by Nesetril and Ossona de Mendez) that is general enough to hold in real-world settings while still providing use with a rich algorithmic toolkit. In particular, we will see a novel characterisation of bounded expansion classes by `neighbourhood complexity' that adds to the already diverse framework of structurally sparse graphs. The only prerequisite for this talk is basic knowledge of graph theory.

Version Spaces for Reliable Classification


In this talk I will propose to consider version spaces as an alternative approach to reliable classification. I will first explain how we can reason with version spaces using a consistency test. Then I will show how reliable classification with version spaces can be realized (e.g., using SVMs). Finally, I will provide experimental results and a comparison with the conformal prediction.

Bayesian segmentation with hidden Markov models
We consider a time-homogenuous hidden Markov model (HMM) with a finite state space. The segmentation problem is to find the unobserved (hidden) realization of the underlying Markov chain. The most common approach is to return the Viterbi path that can be easily found via the celebrated Viterbi algorithm. The Viterbi algorithm assumes the model is fully known. In practice the model is often known up to a parametrization only. A standard approach in this case would be to estimate model parameters first and then the hidden path. However, if the primary goal is segmentation rather than parameter estimation, then it is meaningful to regard the true path as the parameter of interest and focus on the segmentation. In this talk, we consider a Bayesian setup, where the model parameters are assumed to have some prior. Then the model is a mixture of HMM's that, unfortunately, is not necessarily a HMM any more. Therefore the Viterbi algorithm is not applicable. We focus on several iterative non-stochastic algorithms for finding the Viterbi path. These are: 1) estimate p arameters first by (Bayesian) EM, then apply the Viterbi algorithm, 2) segmentation EM, where the path is the parameter of interest; 3) segmentation MM, where the expectation step in the EM algorithm is replaced by maximization, 4) variational Bayes approach; 5) iterative conditional mode (ICM). We consider the model where the rows of the transition matrix are independent draws from a Dirichlet distribution, and emission densities are normal with conjugate (NIX) priors. The emission and transition parameters are independent. In this case all listed methods are applicable and a numerical study to compare their performance is carried out.

Complexity of Quantified Boolean Formulas


The main aim in proof complexity is to understand the complexity of theorem proving. Arguably, what is even more important is to establish techniques for lower bounds, and the recent history of computational complexity speaks volumes on how difficult it is to develop general lower bound techniques. Understanding the size of proofs is important for at least two reasons. The first is its tight relation to the separation of complexity classes: NP vs. coNP for propositional proofs, and NP vs. PSPACE in the case of proof systems for quantified boolean formulas (QBF). The second reason to study lower bounds for proofs is the analysis of SAT and QBF solvers: powerful algorithms that efficiently solve the classically hard problems of SAT and QBF for large classes of practically relevant formulas. In this talk we give an overview of the relatively young field of QBF proof complexity. We explain the main resolution-based proof systems for QBF, modelling CDCL and expansion-based solving. In the main part of the talk we will give an overview of current lower bound techniques (and their limitations) for QBF systems. In particular, we exhibit a new and elegant proof technique for showing lower bounds in QBF proof systems based on strategy extraction. This technique provides a direct transfer of circuit lower bounds to lengths of proofs lower bounds.

An upper bound for competitive on-line regression


This is the story of a chilling rivalry and race to tighten a bound. The talk will focus on a loss upper bound for on-line regression I presented at ALT 2016 two weeks ago. The bound improves on earlier results and enables one to predict nearly as well as any sequence of slowly changing linear functions. The talk will start with a general discussion of theoretical guarantees in machine learning and an introduction to the competitive prediction framework. For MSc students it may serve as a preview of ideas that will be further developed in CS5200 On-line Machine Learning.

Computational creativity and embedded evaluation


"Embedded evaluation" is an important part of creativity in economic, ludic, artistic, and scientific domains. After introducing this concept, the first part of the talk examines a case study in "artificial cosmology" using several simple simulations based on Cellular Automata (CAs). CAs exhibit a range of interesting behaviour, and certain CA rule sets have even been proved to be Turing complete. How might such "intelligent" behaviour emerge in an otherwise disordered computational universe? In the second part of the talk, I will present preliminary work showing how similar ideas can be applied to automatically generate functional programs via a service-oriented architecture.

Challenges of Data Integration: the Impact of Digital Technologies on Biomedical Research


This talk examines current efforts to disseminate, integrate, visualise and use large biological and biomedical datasets, on the basis of extensive historiographic and ethnographic engagement in the development of existing data infrastructures and their use to support scientific discovery ([1]www.datastudies.eu). I focus on the opportunities provided by extensive integration of “big data” from a variety of disciplines and sources, and the challenges confronted by data scientists engaged in facilitating these efforts. While increasing automation and sophisticated technologies play a crucial role in supporting these efforts, human judgment and experimental/field testing and validation of results remain of paramount importance. This analysis supports a philosophical interpretation the extent to which digital technology is transforming scientific research, and the role that human agency needs to play alongside increasingly powerful artificial intelligence.

Money, Power and Drugs - Perspectives of Cyber Security Risk and Strategy From a Practitioner


Robert will be sharing the cyber security strategy and context for GSK, including examples of incidents and intelligence, and how he has created the strategy, what is changing and the team he has built to manage the risk.

Challenges is the Preservation of Software-based Art: A View from the Tate


Artists have used software to create art since the 1960s. From the initial algorithmical drawings created through a few lines of code, the use of software has evolved to support complex interactive installations and unique online environments. This now intrinsic part of our culture has been making its way to Museums, and that will happen more and more. Since 2003, when Tate acquired Michael Craig-Martin's Becoming, it's first software-based artwork, conservators have been developing strategies to preserve this type of works.

Time-based media conservators specifically deal with artworks using media like film, video, audio and software. The aim to be able to display these works in permanence. Unlike painting conservators we don't have much use for brushes or solvents, but just like a painting conservator we do need to understand how a software-based artwork is created, what technologies are involved and what is their meaning in relation to the artwork itself.


The conservation of software-based artworks is a new field, and conservators are having to learn in the praxis how to care for these works. We are doing this by collaborating among ourselves but also and very importantly with external experts in fields like programming, emulation, systems engineering or metadata and documentation. This is relevant in understanding the artworks themselves, but also in finding and developing tools that will allow us to do our work better.
At Tate, conservators specialising in the conservation of time-based media have been developing strategies to analyse artworks, collaborate with artists and artist's programmers, emulate and migrate an artwork's software and document different aspects of an artwork, from its composition to the processes needed to display it correctly in a museum gallery.
In this seminar we will discuss the research happening at Tate around the preservation of our software-based artwork but and the topics that we would like to tackle next.

Explore Royal Holloway

Get help paying for your studies at Royal Holloway through a range of scholarships and bursaries.

There are lots of exciting ways to get involved at Royal Holloway. Discover new interests and enjoy existing ones.

Heading to university is exciting. Finding the right place to live will get you off to a good start.

Whether you need support with your health or practical advice on budgeting or finding part-time work, we can help.

Discover more about our 21 departments and schools.

Find out why Royal Holloway is in the top 25% of UK universities for research rated ‘world-leading’ or ‘internationally excellent’.

Royal Holloway is a research intensive university and our academics collaborate across disciplines to achieve excellence.

Discover world-class research at Royal Holloway.

Discover more about who we are today, and our vision for the future.

Royal Holloway began as two pioneering colleges for the education of women in the 19th century, and their spirit lives on today.

We’ve played a role in thousands of careers, some of them particularly remarkable.

Find about our decision-making processes and the people who lead and manage Royal Holloway today.