Emil Biju

emilbiju@stanford.edu

Stanford University

About me

I am an MS in Electrical Engineering student at Stanford University specializing in AI/Machine Learning. Previously, I worked at Microsoft for two years as a Data and Applied Scientist in the Cybersecurity research team. I completed my Bachelor's degree in Electrical Engineering with a minor in Deep Learning from the Indian Institute of Technology (IIT) Madras. During this time, I pursued research at the intersection of NLP and deep learning that led to publications at ACL, COLING, and ALENEX. My current research interests include interpretable machine learning, compression of LLMs, and LLM agents.

Interests: Interpretable/Efficient ML, NLP, LLM Agents
Download my résumé here


B.Tech (Hons.), Electrical Engg. IIT Madras 2017 - 2021	Data Scientist Intern General Electric Summer 2019	Data and Applied Scientist Microsoft R&D 2021 - 2023	M.S., Electrical Engineering Stanford University Sept. 2023 onwards

Publications

2024

AdaPTwin: Low-Cost Adaptive Compression of Product Twins in Transformers

Emil Biju, Anirudh Sriram, Mert Pilanci

ArXiv Preprint

Abstract PDF BibTeX

While large transformer-based models have exhibited remarkable performance in speaker-independent speech recognition, their large size and computational requirements make them expensive or impractical to use in resource-constrained settings. In this work, we propose a low-rank adaptive compression technique called AdaPTwin that jointly compresses product-dependent pairs of weight matrices in the transformer attention layer. Our approach can prioritize the compressed model's performance on a specific speaker while maintaining generalizability to new speakers and acoustic conditions. Notably, our technique requires only 8 hours of speech data for fine-tuning, which can be accomplished in under 20 minutes, making it highly cost-effective compared to other compression methods. We demonstrate the efficacy of our approach by compressing the Whisper and Distil-Whisper models by up to 45% while incurring less than a 2% increase in word error rate.

@misc{biju2024adaptwinlowcostadaptivecompression,
	title={AdaPTwin: Low-Cost Adaptive Compression of Product Twins in Transformers}, 
	author={Emil Biju and Anirudh Sriram and Mert Pilanci},
	year={2024},
	eprint={2406.08904},
	archivePrefix={arXiv},
	primaryClass={cs.LG},
	url={https://arxiv.org/abs/2406.08904}, 
}

2022

Input-specific Attention Subnetworks for Adversarial Detection

Emil Biju, Anirudh Sriram, Pratyush Kumar, Mitesh M. Khapra

Findings of the Association of Computational Linguistics (ACL 2022)

Abstract PDF Talk Poster Code BibTeX

Self-attention heads are characteristic of Transformer models and have been well studied for interpretability and pruning. In this work, we demonstrate an altogether different utility of attention heads, namely for adversarial detection. Specifically, we propose a method to construct input-specific attention subnetworks (IAS) from which we extract three features to discriminate between authentic and adversarial inputs. The resultant detector significantly improves (by over 7.5%) the state-of-the-art adversarial detection accuracy for the BERT encoder on 10 NLU datasets with 11 different adversarial attack types. We also demonstrate that our method (a) is more accurate for larger models which are likely to have more spurious correlations and thus vulnerable to adversarial attack, and (b) performs well even with modest training sets of adversarial examples.

@inproceedings{biju-etal-2022-input,
title = "Input-specific Attention Subnetworks for Adversarial Detection",
author = "Biju, Emil  and
	Sriram, Anirudh  and
	Kumar, Pratyush  and
	Khapra, Mitesh",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2022",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.findings-acl.4",
doi = "10.18653/v1/2022.findings-acl.4",
pages = "31--44"
}

Perturbation Analysis of Practical Algorithms for the Maximum Scatter TSP

Emil Biju, Sundar Raman P.

Symposium on Algorithm Engineering and Experiments (ALENEX 2022)

Abstract PDF Webpage Code BibTeX

The Maximum Scatter Traveling Salesman Problem (MSTSP) is a variant of the famous Traveling Salesman Problem (TSP) and finds its use in several real-world applications including manufacturing, imaging and laser melting processes. The objective of this NP-hard problem is to maximize the cost of the least cost edge in a tour of an input graph. While several approximation algorithms have been proposed for this problem, many of them suffer from bad worst-case complexities and present challenges in scalability and practical use. Besides, these algorithms have often been designed and evaluated with a sole focus on theoretical approximation quality, while practical applications require detailed experimental evaluations to study the stability, quality and runtime over a large and diverse set of inputs. In this work, we describe six algorithms for MSTSP inspired by prior work in the area, along with improved formulations that enhance their utility in real-world scenarios. Further, we perform experimental studies motivated by smoothed analysis to comprehensively evaluate these algorithms on various performance metrics. We demonstrate that despite having bad worst-case complexities, certain algorithms perform exceedingly well in practical use cases. Our experiments reveal trade-offs among the runtime, quality and stability of different algorithms that must be considered when making a design choice depending on the objectives and constraints associated with the use of the algorithm.

@inbook{doi:10.1137/1.9781611977042.13,
	author = {Emil Biju and Sundar Raman P.},
	title = {Perturbation Analysis of Practical Algorithms for the Maximum Scatter Travelling Salesman Problem},
	booktitle = {2022 Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX)},
	chapter = {},
	pages = {158-168},
	doi = {10.1137/1.9781611977042.13},
	URL = {https://epubs.siam.org/doi/abs/10.1137/1.9781611977042.13},
	eprint = {https://epubs.siam.org/doi/pdf/10.1137/1.9781611977042.13}
}

2021

Vocabulary-constrained Question Generation with Rare Word Masking and Dual Attention

Emil Biju

ACM India Joint International Conference on Data Science & Management of Data (ACM CODS-COMAD 2021)

Abstract PDF Talk Slides BibTeX

Question generation is the task of generating questions from a text passage that can be answered using information available in the passage. Known models for question generation are trained to predict words from a large, predefined vocabulary. However, a large vocabulary increases memory usage, training and inference times and a predefined vocabulary may not include context-specific words from the input passage. In this paper, we propose a neural question generation framework that generates semantically accurate and context-specific questions using a small-size vocabulary. We break the question generation task into two subtasks namely, generating the skeletal structure of a question using common words from the vocabulary and pointing to rare words from the input passage to complete the question.

@inproceedings{10.1145/3430984.3431074,
	author = {Biju, Emil},
	title = {Vocabulary-Constrained Question Generation with Rare Word Masking and Dual Attention},
	year = {2021},
	isbn = {9781450388177},
	publisher = {Association for Computing Machinery},
	address = {New York, NY, USA},
	url = {https://doi.org/10.1145/3430984.3431074},
	doi = {10.1145/3430984.3431074},
	booktitle = {8th ACM IKDD CODS and 26th COMAD},
	pages = {431},
	numpages = {1},
	location = {Bangalore, India},
	series = {CODS COMAD 2021}
	}

Sample-specific Attention Masks for Model Transparency and Adversarial Detection

Emil Biju | Guide: Prof. Mitesh Khapra

Bachelors Thesis

Abstract PDF Slides Code

Multi-head self-attention is a vital component of the Transformer, a neural network architecture that has achieved state-of-the-art performance on various sequence learning tasks. In this work, we propose a method to induce accurate sample-specific attention masks for Transformer-based networks like BERT. We show that such masks are discrim- inative of the samples’ output class for various Natural Language Understanding (NLU) tasks. We leverage this property to improve model transparency and classify between authentic and adversarial samples. Further, we find two other properties which contribute to this classification. First, selectively mutating the masks leads to contrasting model outputs based on sample authenticity. Second, the consistency of auxiliary layer-wise outputs varies based on sample authenticity. By combining these observations, we propose a sample-efficient and generalized scheme for adversarial detection. We perform experiments on 8 NLU datasets with 11 different adversarial attack types and report state-of-the-art accuracy ranging from 80 to 90%. In summary, our work introduces an entirely new and promising approach to interpret and analyze large self-attention based networks for NLP.

2020

Joint Transformer/RNN Architecture for Gesture Typing in Indic Languages

Emil Biju, Anirudh Sriram, Mitesh M. Khapra, Pratyush Kumar

International Conference on Computational Linguistics (COLING 2020)

Abstract PDF Poster Webpage Code BibTeX

Gesture typing is a method of typing words on a touch-based keyboard by creating a continuous trace passing through the relevant keys. This work is aimed at developing a keyboard that supports gesture typing in Indic languages. We begin by noting that when dealing with Indic languages, one needs to cater to two different sets of users: (i) users who prefer to type in the native Indic script (Devanagari, Bengali, etc.) and (ii) users who prefer to type in the English script but want the transliterated output in the native script. In both cases, we need a model that takes a trace as input and maps it to the intended word. To enable the development of these models, we create and release two datasets. First, we create a dataset containing keyboard traces for 193,658 words from 7 Indic languages. Second, we curate 104,412 English-Indic transliteration pairs from Wikidata across these languages. Using these datasets we build a model that performs path decoding, transliteration and transliteration correction. Unlike prior approaches, our proposed model does not make co-character independence assumptions during decoding. The overall accuracy of our model across the 7 languages varies from 70-95%.

@inproceedings{biju-etal-2020-joint,
	title = "Joint Transformer/{RNN} Architecture for Gesture Typing in Indic Languages",
	author = "Biju, Emil  and
		Sriram, Anirudh  and
		Khapra, Mitesh M.  and
		Kumar, Pratyush",
	booktitle = "Proceedings of the 28th International Conference on Computational Linguistics",
	month = dec,
	year = "2020",
	address = "Barcelona, Spain (Online)",
	publisher = "International Committee on Computational Linguistics",
	url = "https://aclanthology.org/2020.coling-main.87",
	doi = "10.18653/v1/2020.coling-main.87",
	pages = "999--1010"
	}

Patents

2023

Graph-AI based Methods and Solutions for Detecting Malicious Applications

Emil Biju and 7 co-inventors

Patent filed in Jan 2023

Education

Sept. 2023 onwards

Stanford University

M.S. in Electrical Engineering. Focus area: AI/ML

Coursework and research in machine learning with applications in NLP/LLMs.

2017-2021

Indian Institute of Technology Madras

B.Tech (Honors) in Electrical Engineering, Minor in Deep Learning
CGPA: 9.70/10

Graduated as the second topper of the Electrical Engineering department
Recipient of the Silver Medal (Dr. Dilip Veeraraghavan Memorial Award)
Received the top grade (S) in all Computer Science, Mathematics and Humanities courses

Work Experience

Microsoft R&D

Data Scientist Intern
June - September, 2024

Developed a cost-efficient LLM pipeline to analyze responses from Microsoft Copilot and derive insights about common failure patterns.

Data and Applied Scientist
2021-2023

Working as a researcher at the intersection of data science and cybersecurity, with a focus on OAuth cloud app security. Built ML and graph-based algorithms that have contributed to the detection of cyberattack campaigns and takedown of over 2000 malicious OAuth apps.

Data and Applied Scientist Intern
May - July, 2020

Interned in the Azure Global Engineering team analyzing multi-spectral satellite image data for applications in the agricultural and oil industries.

General Electric

Data Science Intern
May - July, 2019

Developed ML models for keyword clustering and topic ranking of service records for identifying common failure modes in healthcare machines.