Data Science

The Data Science Group focuses on extracting meaning from large-scale, heterogeneous, and variable data.. Our research spans various disciplines including machine learning and data mining, computer vision, natural language processing, social computing, databases, data provenance, climate data analysis, and intelligent systems.

Our research interests are in addressing the challenges related to 'big data' and 'data streams', transforming digital records, fingerprints, traces, signals into human-understandable and actionable knowledge without being limited to a specific structure or mathematical framework of representation (e.g. probabilistic, possibilistic, linguistic, graph-based visual, etc.). We study, develop and approbate new methods, algorithms, software and use hardware for applications that span from automated analysis of unstructured text, intelligent real-time video analytics, studies in bio-medicine, social communications and interactions etc. Our research results in published high quality peer reviewed publications, including journal, conference papers, patents, research monographs, edited books, algorithms and software. Some of them are award winning, others attract hundreds of citations. We collaborate with leading industrial partners and are funded by UK Research councils, EU, MoD, industry and other agencies, such as GCHQ, for example.

Members and Interests

Prof. Plamen Angelov (Group Leader)

From (Big) Data (Streams) to Knowledge in Real Time: Innovative and pioneering research into developing new methodologies, approaches, algorithms and applications that allow to extract human-interpretable knowledge from (big) data (streams). Pioneered the concept of evolving fuzzy rule-based models and classifiers; world leader in evolving systems, in general which do have not fixed structure, but self-develop dynamically from the data pattern.

Intelligent Systems and Applications: Innovative studies in collaborative machine learning, dynamically evolving clustering and anomaly detection, autonomous systems, fault detection and identification. Application areas include intelligent video analytics, climate research, biomedical and security applications, but are not limited to those domains including also advanced process industry, evolving user behaviour modelling etc.

Dr. Paul Rayson

My research interests are based on applications of corpus-based natural language processing to address significant challenges in a number of different areas: child protection in online social networks, better understanding of the language of extremism and counter extremism, text mining for conceptual history studies, the quality of the corporate financial information environment and the use of metaphorical language in end-of-life care. My methodological contributions are in the areas of key semantic domains, the analysis of spelling variation in historical texts and online language and in corpus analysis software.

Dr. John Mariani

I have been involved in the development of new tools for workers in information-heavy environments, with a partial emphasis on shared or collaborative scenarios.

I am particularly interested in the end-user aspects; providing simple interfaces and information visualisations to support the exploration and navigation of large data spaces.

Dr. Matthew Rowe

Social Media Engagement: Examining how and why users engage with one another on social media, within communities and social networks, and how such engagement evolves over time. This line of work involves the creation of novel user modelling approaches that incorporate behaviour evolution and influence dynamics (i.e. how social norms within a system affect user development).

Data Mining Methods: Investigating a range of computational methods to mine knowledge from data including: automated prediction and detection techiques, that function over evolving models, based on objective optimisation methods, and; disambiguation and coreference resolution techniques. All examined methods are implemented and deployed in a parallelised fashion, running over a distributed computational framework of clustered servers.

Dr. Jun Zhao

Provenance data management, modeling and querying and mining: Researching how to ground provenance data modelling and query upon exiting graph data mining techniques (like frequent graph pattern identification, graph diff and isomorphism), and how to apply reasoning-based techniques to address identity reconciliation issues, and support provenance query using information retrieval and semantic query techniques

Supporting new forms of scholarly communication: Developing new techniques for the assessment and justification of scientific reproducibility using provenance information, i.e. explaining why the same results were not reproduced and tracing causalities.

Search this site

Find a course