Alejandro Mosquera | Computer Scientist

About

Online safety expert and computer scientist at Broadcom developing systems and methods for automatic Threat Hunting: malicious network traffic analysis, malware analysis, unknown threat categorization, messaging abuse filters, APT detection and attack chain inference based on Machine Learning.

Research

I enjoy developing software with great colleagues, and I've been fortunate to have worked with many wonderful and talented people. As a researcher, my job usually involves:

Applying quantitative methods to solve complex data problems involving risk scoring, entity and user behaviour analysis, etc.
Translating product ideas into data science problems, and solving them.
Prototyping tools and data pipelines to extract meaningful insight from innovative sources of data.

Other research areas of interest are Natural Language Processing, procedural generation and Trustworthy AI.

In particular:

Identifying and investigating failure modes for AI systems, and building solutions to address them.
Conducting empirical or theoretical research into technical safety and security mechanisms for AI systems.
Evaluating AutoNLP techniques for the accurate and efficient detection of unsafe content.

Personal

Lover of coffee, Earl Grey and Lego (in alphabetical order). Sometimes I blog. You can also find me participating in competitive Machine Learning challenges during my spare time.

Highlights

2024 - 🏆3rd place (out of 50 teams) at SemEval-2024 Task 6: SHROOM - a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes

2023 - 🥇1st place at CSCML CTF: The International Symposium on Cyber Security, Cryptology and Machine Learning

2023 - 19th place (out of 2681 teams) at HackAPrompt (AICrowd, FlanT5-XXL only): A prompt hacking competition to outsmart LLMs and evade prompt injection defenses

2023 - Top 10% at Stable Diffusion - Image to Prompts (Kaggle): Evaluating Prompt Stealing Attacks Against Text-to-Image Generation Models

2023 - 15th place (out of 84 teams) at SemEval-2023 Task 10: Pretrained Models with Adversarial Training for Online Sexism Detection (EDOS)

2022 - 11th/18 (MNTD baseline beaten by 43% higher AUC) at Trojan Detection Challenge @ NeurIPS 2022

2022 - 56th place / 2nd best score (out of 676 teams) at AI Village CTF @ DEFCON 30

2022 - 🥇1st place (out of 10 teams) at KONVENS-2022 Task 1: Tackling Data Drift with Adversarial Validation: An Application for German Text Complexity Estimation

2022 - 🏆3rd place (out of 20 teams) at IberLEF-2022: Towards Robust Spanish Author Profiling and Lessons Learned from Adversarial Attacks

2021 - 🏆2nd place (in both defender and attacker tracks) at MLSEC-2021: Thwarting Adversarial Malware Evasion with a Defense-in-Depth

2021 - 6th place (out of 31 teams) at IberLEF-2021 Task 1: Deep Learning Approaches to Toxicity Detection in Spanish Social Media Texts

2021 - Granted USPTO anti-ransomware patent: Detecting and protecting against computing breaches based on lateral movement of a computer file within an enterprise

2021 - 🏆3rd place (out of 48 teams) at SemEval-2021 Task 1: Exploring Sentence and Word Features for Lexical Complexity Prediction

2020 - 45th place (out of 6351 teams) at Kaggle IEEE-CIS Fraud Detection

2020 - 10th place (out of 82 teams) at SemEval-2020 Task 12: Offensive Language Detection Using Neural Networks and Anti-adversarial Features

Software

Spanish Metaphone

Metaphone is a phonetic algorithm published in 1990 for indexing words by their English pronunciation. It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate encoding, which does a better job of matching words and names which sound similar. As with Soundex, similar sounding words should share the same keys.

This is an adaptation for the Spanish language and implemented in Python.

For example (input word, metaphone):

waterpolo -> UTRPL
aquino -> AKN
rebosar -> RVSR
rebozar -> RVZR
grajea -> GRJ
gragea -> GRJ
encima -> ENZM
enzima -> ENZM
alhamar -> ALAMR

NaiveSumm

NaiveSumm is a naive summarization approach based on Luhn1958 work "The Automatic Creation of Literature Abstracts" It uses the frequencies of words in the document in order to calculate and extract the sentences that include the most frequent words.

Presentations

2014 - 5th Workshop on Language Analysis for Social Media (LASM), Sweden: Mining Lexical Variants from Microblogs: An Unsupervised Multilingual Approach

2013 - Conference and Labs of the Evaluation Forum (CLEF), Spain: DLSI-Volvam at RepLab 2013: Polarity Classification on Twitter Data

2013 - Tweet Normalization Workshop co-located with 29th Conference of the Spanish Society for Natural Language Processing (SEPLN), Spain: DLSI en Tweet-Norm 2013: Normalización de Tweets en Español

2012 - TSD: Text, Speech and Dialogue, Czech Republic: TENOR: A Lexical Normalisation Tool for Spanish Web 2.0 Texts

2012 - NLDB: Natural Language Processing and Information Systems, Holland: The Study of Informality as a Framework for Evaluating the Normalisation of Web 2.0 Texts

2012 - Real-Time Analysis and Mining of Social Media Streams (RAMSS), Ireland: SMILE: An Informality Classification Tool for Helping to Assess Quality and Credibility in Web 2.0 Texts

2012 - Natural Language Processing for Improving Textual Accessibility (NLP4ITA), Turkey: Towards Facilitating the Accessibility of Web 2.0 Texts through Text Normalisation

2012 - @NLP can u tag #user_generated_content? (NLP4UGC), Turkey: A Qualitative Analysis of Informality Levels In Web 2.0 Texts: The Facebook Case Study

2011 - Symposium in Information and Human Language Technology (STIL), Brazil: The Use of Metrics for Measuring Informality Levels in Web 2.0 Texts

2011 - 3rd Language Technology Conference (LTC), Poland: Enhancing the Discovery of Informality Levels in Web 2.0 texts

Media mentions

2023 - Recognized as a Webometrics ambassador: A ranking of Spanish researchers working abroad according to their Google Scholar Citations public profiles

2021 - 🏆Winner interview for the 3rd Machine Learning Security Evasion Competition (MLSEC) sponsored by CUJO AI, Microsoft, VM-Ray, MRG Effitas and NVIDIA. As a 2x prize winner I had the opportunity to publish my findings about the Adversarial Threat Landscape for Artificial-Intelligence Systems

2016 - 🏆Kaggle Winner's interview for a 3rd place at Allen AI. I was also interviewed by Cade Metz for the Wired magazine and mentioned in the Allen AI final report: Moving Beyond the Turing Test with the Allen AI Science Challenge

2014 - Coverage of our work defending SMS networks while working at Symantec: Security rEsrchRs find nu way 2 spot TXT spam

Procedural generation

JS1k submission using L-systems.

Chromanin.js a procedural texture generation library

Procedural audio using L-systems based on Ville-Matias Heikkilä (2011). Discovering novel computer music techniques by exploring the space of short computer programs.

SuperShapes, SuperFormula based on Johan Gielis (2003). A generic geometric transformation that unifies a wide range of natural and abstract shapes.

Contact

My social accounts are linked below:

Alejandro Mosquera López