Dr Alistair Baron
Faculty Fellow - Security LancasterResearch Overview
My primary research areas are Natural Language Processing (NLP) and Cyber Security, with a particular focus on developing solutions to the problems associated with the vast amounts of textual data in online settings; for example, deception and multiple personae detection techniques to assist in countering the use of fake profiles for nefarious purposes. The noisy characteristics of online texts, e.g. the abundance of irregular language and its multi-lingual nature, pose significant barriers to many NLP methods. A primary aim of my research is to build robust NLP tools which are able to cope with, and take advantage of, these features.
Semantic tagging and early modern collocates
Alexander, M., Baron, A., Dallachy, F., Piao, S., Rayson, P., Wattam, S. 22/07/2015
Abstract
Large-scale time-sensitive semantic analysis of historical corpora
Rayson, P., Baron, A., Piao, S., Wattam, S. 27/05/2015
Abstract
Analysis and recommendations for standardisation in penetration testing and vulnerability assessment: penetration testing market survey
Knowles, W., Baron, A., McGarr, T. 26/05/2015 BSI. 20 p.
Commissioned report
Automatically analysing large texts in a GIS environment: the Registrar General’s reports and cholera in the nineteenth century
Murrieta-Flores, P., Baron, A., Gregory, I., Hardie, A., Rayson, P. 04/2015 In: Transactions in GIS. 19, 2, p. 296-320. 25 p.
Journal article
Metaphor, popular science and semantic tagging: distant reading with the Historical Thesaurus of English
Alexander, M., Dallachy, F., Piao, S., Baron, A., Rayson, P. 2015 In: Digital Scholarship in the Humanities.
Journal article
Normalising the corpus of English dialogues (1560-1760) using VARD2: decisions and justifications
Archer, D., Kytö, M., Baron, A., Rayson, P. 4/05/2014
Conference paper
Developing the Historical Thesaurus Semantic Tagger
Piao, S., Dallachy, F., Baron, A., Rayson, P., Alexander, M. 2014
Abstract
Who am I? Analysing Digital Personas in Cybercrime Investigations
Rashid, A., Baron, A., Rayson, P., May-Chahal, C., Greenwood, P., Walkerdine, J. 04/2013 In: Computer. 46, 4, p. 54-61. 8 p.
Journal article
Customising geoparsing and georeferencing for historical texts
Rupp, C.J., Rayson, P., Baron, A., Donaldson, C., Gregory, I., Hardie, A., Murrieta-Flores, P. 2013 In: Proceedings of the 2013 IEEE International Conference on Big Data. IEEE p. 59-62. 4 p.
Paper
Prerequisites to a corpus-based analysis of EEBO-TCP
Baron, A., Hardie, A. 09/2012
Abstract
Which 'Lancaster' do you mean? Disambiguation challenges in extracting place names for Spatial Humanities
Rayson, P., Baron, A., Hardie, A. 09/2012
Abstract
Technological solutions to offending
Rashid, A., Greenwood, P., Walkerdine, J., Baron, A., Rayson, P. 03/2012 In: Understanding and preventing online sexual exploitation of children. London : Willan p. 228-243.
Chapter (peer-reviewed)
"i didn't spel that wrong did i. Oops": Analysis and normalisation of SMS spelling variation
Tagg, C., Baron, A., Rayson, P. 2012 In: Lingvisticæ Investigationes. 35, 2, p. 367-388. 22 p.
Journal article
Children Online: A survey of child language and CMC corpora
Baron, A., Rayson, P., Greenwood, P., Walkerdine, J., Rashid, A. 2012 In: International Journal of Corpus Linguistics. 17, 4, p. 443-481. 39 p.
Journal article
Using verifiable author data: Gender and spelling differences in Twitter and SMS
Baron, A., Tagg, C., Rayson, P., Greenwood, P., Walkerdine, J., Rashid, A. 06/2011
Conference paper
Automatic error tagging of spelling mistakes in learner corpora
Rayson, P., Baron, A. 2011 In: A Taste for Corpora. Amsterdam : John Benjamins p. 109-126. 28 p. ISBN: 978 90 272 0350 2. Electronic ISBN: 978 90 272 8708 3.
Chapter
Innovators of Early Modern English spelling change: Using DICER to investigate spelling variation trends
Baron, A., Rayson, P., Archer, D. 2011
Abstract
Quantifying Early Modern English spelling variation: change over time and genre
Baron, A., Rayson, P., Archer, D. 2011
Abstract
"I didn't spel that wrong did i. Oops": Analysis and standardisation of SMS spelling variation
Tagg, C., Baron, A., Rayson, P. 27/05/2010
Conference paper
Improving the precision of corpus methods: The standardized version of Early Modern English Medical Texts
Lehto, A., Baron, A., Ratia, M., Rayson, P. 2010 In: Early Modern English Medical Texts. Amsterdam : John Benjamins p. 279-289. 11 p. ISBN: 978 90 272 1177 4.
Chapter
Automatic Standardization of Spelling for Historical Text Mining
Baron, A., Rayson, P., Archer, D. 06/2009
Poster
The extent of spelling variation in Early Modern English
Baron, A., Rayson, P., Archer, D. 05/2009
Conference paper
Automatic standardisation of texts containing spelling variation: How much training data do you need?
Baron, A., Rayson, P. 2009 In: Proceedings of the Corpus Linguistics Conference. Lancaster : Lancaster University 25 p.
Paper
Word frequency and key word statistics in corpus linguistics
Baron, A., Rayson, P., Archer, D. 2009 In: Anglistik. 20, 1, p. 41-67. 27 p.
Journal article
VARD2: a tool for dealing with spelling variation in historical corpora
Baron, A., Rayson, P. 05/2008
Conference paper
Travelling through time with corpus annotation software
Rayson, P., Archer, D., Baron, A., Smith, N. 2008 In: Corpus linguistics, computer tools, and applications - state of the art. Frankfurt am Main : Peter Lang p. 29-46. 18 p. ISBN: 978-3-631-58311-1.
Chapter
Conceptual Glossary and Index to the Vulgate Translation of the Gospel according to Mark.
Wilson, A., Baron, A., Worth, C. 2007 Hildesheim : Olms-Weidmann. 824 p. ISBN: 9783487134208.
Book
Tagging historical corpora - the problem of spelling variation
Rayson, P., Archer, D., Baron, A., Smith, N. 2007
Conference paper
Tagging the Bard: Evaluating the Accuracy of a Modern POS Tagger on Early Modern English Corpora
Rayson, P., Archer, D., Baron, A., Culpeper, J., Smith, N. 2007 In: Proceedings of the Corpus Linguistics Conference. UCREL 14 p.
Paper
Early Detection of Insider Threats by Autonomous Analysis of User Behaviour Evolution
01/10/2014 → 31/03/2018
Projects
Corpus Research in Early Modern English
01/10/2011 → …
Research
CRESTx Lancaster
Participation in conference
Faculty of Science and Technology Research Fellowship (Security Lancaster)
Fellowship awarded competitively
VARD 2, DICER, historical spelling variation and modern ‘noisy’ data
Invited talk
UCREL Corpus Research Seminar
Participation in workshop, seminar, course
Word frequency and key word statistics in historical corpus linguistics
Invited talk