Close

Akanksha Dara

Software Engineer

Stony Brook University | Apple | BITS Pilani

Download Résumé

About Me

I'm a Computer Science Master's student at Stony Brook University. My research interests include Natural Language Processing and Information Extraction.

Experience

Apple Inc.

Software Engineer

- Java Development: Implemented a Solr cluster solution to facilitate efficient management of the test cases. Delivered business logic in back-end APIs and an interactive web application.

- Automated the process of publishing code coverage reports using JaCoCo (Java Code Coverage) as an agent across all the servers by generating the build files dynamically; analysed the coverage for reflection classes in the source code. Used these results to identify redundancy in code execution.

Memory and Language Lab, University of Melbourne

Research Intern

- This internship was a part of my undergraduate research thesis (Advisor: Prof. Simon Dennis, director of Memory and Language Lab).

- Built a computational model of language processing that characterized sentence processing and learning as an interaction of three memory systems (lexical, syntactic, and relational) that operate on distributed instance-based knowledge representations.

- Conducted several experiments to optimize the model with different input feature sets, objective functions, corpora and tokenization schemes. This helped in gaining insight into the operation of the model, in particular the impact of interference on memory based algorithms (used the stochastic gradient descent algorithm for learning).

- Worked on a high performance computing platform to optimize, parallelize and scale up the model to a corpora of the order 70M. Achieved a perplexity of 42 on a vocabulary size of 65,536.

Education

Stony Brook University

January 2020 - December 2021

Master's in Computer Science

Birla Institute of Technology and Science, Pilani

August 2014 - May 2018

Bachelor of Engineering (Hons.) Computer Science

Projects

CNN + word2vec for Sentence Classification

In this project, we have trained a simple CNN with one layer of convolution on top of word vectors obtained from an unsupervised neural language model.

View Project

Semantic Textual Similarity (STS) of Clinical Notes

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur in iaculis ex. Etiam volutpat laoreet urna. Morbi ut tortor nec nulla commodo malesuada sit amet vel lacus. Fusce eget efficitur libero. Morbi dapibus porta quam laoreet placerat.

View Project

Data Visualization Dashboard for AirBnb

Created an interactive dashboard for visualizing the AirBnb listings data. The basic visualizations that we use to analyze our data are: histogram, wordcloud, cartogram, donut chart and scatterplot matrix.

We analyzed the various neighborhoods of Hawaii to get an insight on their prices, type of accommodations and the popular tourist spots.

View Project

Sarcasm Detection: Behavioral Modelling Approach

Implemented sarcasm detection as a binary classification problem using the User’s behavior modelling approach. Identified sarcasm as three different forms of expressions.

This was achieved by leveraging users’ historical information of past tweets and by identifying sarcasm as a contrast of sentiments, as a means of conveying emotion and as a function of familiarity, which I further translated into feature sets for training various supervised learning algorithms; made some more enhancements such as handling the use of emoticons and complex hashtags effectively to improvise the results.

View Project
View More Projects

Compiler Construction

Implemented a compiler for the language ERPLAG. The project was done in a pipelined manner with its various stages being: Lexer - Parser - Abstract Syntax Tree generation - Type-checking, Semantic checking and Assembly Code Generation.

View Project

Text Summarization using Audio Retieval

Generated a textual summary for a noise-free audio dataset using audio frequency and amplitude and modified LexRank algorithm to visualize sentences as vertices of a graph while taking into account the term frequencies of keywords, and the frequencies of words spoken as appearing in the audio waveform generated.

Implemented the seq2seq model in TensorFlow and trained it on the CNN Daily Mail data set to obtain a Rouge-1 score of 0.39 and compared it’s performance to that of the LexRank algorithm (Rouge-1 score of 0.59).

HMV - Medical Decision Support Framework

Implemented an ensemble framework using hierarchical majority voting and multi-layer classification for disease classification and prediction using data mining techniques.

The model overcomes the limitations of conventional performance bottlenecks by utilizing an ensemble of seven heterogeneous classifiers.

View Project

Skills

Graduate Coursework


CSE 512 - Machine Learning
CSE 544 - Probability & Statistics for Data Scientists
CSE 564 - Visualization
CSE 519 - Data Science Fundamentals
CSE 538 - Natural Language Processing


Courses Completed Online


CSE 231n - Convolutional Neural Networks for Visual Recognition (Stanford)
CSE 224n - Natural Language Processing with Deep Learning (Stanford)

Get in Touch