I'm a Computer Science Master's student at Stony Brook University. My research interests include Natural Language Processing and Information Extraction.
- Java Development: Implemented a Solr cluster solution to facilitate efficient management of the test cases. Delivered business logic in back-end APIs and an interactive web application.
- Automated the process of publishing code coverage reports using JaCoCo (Java Code Coverage) as an agent across all the servers by generating the build files dynamically; analysed the coverage for reflection classes in the source code. Used these results to identify redundancy in code execution.
- This internship was a part of my undergraduate research thesis (Advisor: Prof. Simon Dennis, director of Memory and Language Lab).
- Built a computational model of language processing that characterized sentence processing and learning as an interaction of three memory systems (lexical, syntactic, and relational) that operate on distributed instance-based knowledge representations.
- Conducted several experiments to optimize the model with different input feature sets, objective functions, corpora and tokenization schemes. This helped in gaining insight into the operation of the model, in particular the impact of interference on memory based algorithms (used the stochastic gradient descent algorithm for learning).
- Worked on a high performance computing platform to optimize, parallelize and scale up the model to a corpora of the order 70M. Achieved a perplexity of 42 on a vocabulary size of 65,536.
In this project, we have trained a simple CNN with one layer of convolution on top of word vectors obtained from an unsupervised neural language model.
View ProjectLorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur in iaculis ex. Etiam volutpat laoreet urna. Morbi ut tortor nec nulla commodo malesuada sit amet vel lacus. Fusce eget efficitur libero. Morbi dapibus porta quam laoreet placerat.
View ProjectCreated an interactive dashboard for visualizing the AirBnb listings data. The basic visualizations that we use to analyze our data are: histogram, wordcloud, cartogram, donut chart and scatterplot matrix.
We analyzed the various neighborhoods of Hawaii to get an insight on their prices, type of accommodations and the popular tourist spots.
View ProjectImplemented sarcasm detection as a binary classification problem using the User’s behavior modelling approach. Identified sarcasm as three different forms of expressions.
This was achieved by leveraging users’ historical information of past tweets and by identifying sarcasm as a contrast of sentiments, as a means of conveying emotion and as a function of familiarity, which I further translated into feature sets for training various supervised learning algorithms; made some more enhancements such as handling the use of emoticons and complex hashtags effectively to improvise the results.
View ProjectImplemented a compiler for the language ERPLAG. The project was done in a pipelined manner with its various stages being: Lexer - Parser - Abstract Syntax Tree generation - Type-checking, Semantic checking and Assembly Code Generation.
View ProjectGenerated a textual summary for a noise-free audio dataset using audio frequency and amplitude and modified LexRank algorithm to visualize sentences as vertices of a graph while taking into account the term frequencies of keywords, and the frequencies of words spoken as appearing in the audio waveform generated.
Implemented the seq2seq model in TensorFlow and trained it on the CNN Daily Mail data set to obtain a Rouge-1 score of 0.39 and compared it’s performance to that of the LexRank algorithm (Rouge-1 score of 0.59).
Implemented an ensemble framework using hierarchical majority voting and multi-layer classification for disease classification and prediction using data mining techniques.
The model overcomes the limitations of conventional performance bottlenecks by utilizing an ensemble of seven heterogeneous classifiers.
View Project