Alejandro Mosquera

R&D Software Engineer at Broadcom developing systems and methods for automatic threat hunting: malicious network traffic analysis, malware analysis, unknown threat categorization, messaging abuse filters, APT detection and attack chain inference based on machine learning.

These usually involve:

  • Applying quantitative methods to solve complex data problems involving risk scoring, entity and user behaviour analysis, etc.
  • Translating product ideas into data science problems, and solving them.
  • Prototyping tools and data pipelines to extract meaningful insight from innovative sources of data.

Other research interests are Natural Language Processing, procedural generation, concept drift, adversarial machine learning and their applications to the cyber-security domain.

Highlights


2022 - 3rd place (out of 20 teams) at IberLEF 2022 - PoliticEs Spanish Author Profiling for Political Ideology

2021 - 2nd place (in both defender and attacker tracks) at MLSEC-2021: Thwarting Adversarial Malware Evasion with a Defense-in-Depth

2021 - 6th place (out of 31 teams) at DETOXIS-2021 Task 1: Deep Learning Approaches to Toxicity Detection in Spanish Social Media Texts

2021 - Granted USPTO patent: Detecting and protecting against computing breaches based on lateral movement of a computer file within an enterprise

2021 - 3rd place (out of 48 teams) at SemEval-2021 Task 1: Exploring Sentence and Word Features for Lexical Complexity Prediction

2020 - 45th place (out of 6351 teams) at Kaggle IEEE-CIS Fraud Detection

2020 - 10th place (out of 82 teams) at SemEval 2020 task 12: Offensive Language Detection Using Neural Networks and Anti-adversarial Features

Software


Spanish Metaphone

Metaphone is a phonetic algorithm published in 1990 for indexing words by their English pronunciation. It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate encoding, which does a better job of matching words and names which sound similar. As with Soundex, similar sounding words should share the same keys.

This is an adaptation for the Spanish language and implemented in Python.

For example (input word, metaphone):

waterpolo -> UTRPL
aquino -> AKN
rebosar -> RVSR
rebozar -> RVZR
grajea -> GRJ
gragea -> GRJ
encima -> ENZM
enzima -> ENZM
alhamar -> ALAMR

NaiveSumm

NaiveSumm is a naive summarization approach based on Luhn1958 work "The Automatic Creation of Literature Abstracts" It uses the frequencies of words in the document in order to calculate and extract the sentences that include the most frequent words.

Presentations


2014 - 5th Workshop on Language Analysis for Social Media (LASM), Sweden: Mining Lexical Variants from Microblogs: An Unsupervised Multilingual Approach

2013 - Conference and Labs of the Evaluation Forum (CLEF), Spain: DLSI-Volvam at RepLab 2013: Polarity Classification on Twitter Data

2013 - Tweet Normalization Workshop co-located with 29th Conference of the Spanish Society for Natural Language Processing (SEPLN), Spain: DLSI en Tweet-Norm 2013: Normalización de Tweets en Español

2012 - TSD: Text, Speech and Dialogue, Czech Republic: TENOR: A Lexical Normalisation Tool for Spanish Web 2.0 Texts

2012 - NLDB: Natural Language Processing and Information Systems, Holland: The Study of Informality as a Framework for Evaluating the Normalisation of Web 2.0 Texts

2012 - Real-Time Analysis and Mining of Social Media Streams (RAMSS), Ireland: SMILE: An Informality Classification Tool for Helping to Assess Quality and Credibility in Web 2.0 Texts

2012 - Natural Language Processing for Improving Textual Accessibility (NLP4ITA), Turkey: Towards Facilitating the Accessibility of Web 2.0 Texts through Text Normalisation

2012 - @NLP can u tag #user_generated_content? (NLP4UGC), Turkey: A Qualitative Analysis of Informality Levels In Web 2.0 Texts: The Facebook Case Study

2011 - Symposium in Information and Human Language Technology (STIL), Brazil: The Use of Metrics for Measuring Informality Levels in Web 2.0 Texts

2011 - 3rd Language Technology Conference (LTC), Poland: Enhancing the Discovery of Informality Levels in Web 2.0 texts

Media mentions


2016 - I was a prize winner of the Allen Institute for AI (AI2) first science challenge: Moving Beyond the Turing Test with the Allen AI Science Challenge

2014 - Coverage of our work defending SMS networks while working at Symantec: Security rEsrchRs find nu way 2 spot TXT spam

Procedural generation


JS1k submission using L-systems.

Chromanin.js a procedural texture generation library

Procedural audio using L-systems based on Ville-Matias Heikkilä (2011). Discovering novel computer music techniques by exploring the space of short computer programs.

SuperShapes, SuperFormula based on Johan Gielis (2003). A generic geometric transformation that unifies a wide range of natural and abstract shapes.

Contact


My social accounts are linked below: