I work as a Data Scientist at Infor Dynamic Science Labs - located in Cambridge, Massachusetts.
I was a graduate student in Computer Science at the University of Florida, Gainesville. I am passionate about Data Science and everyday I find some reason to get excited by its pervasive use in almost any domain. Is there at all any area where one can't apply Data Science to it? I am eager to make a tangible impact on the world by developing data products.
Also, I am a strong advocate of the role of open-source in aiding future technology. I open-source my projects and look to contribute (atleast in a minimal way) to other open source projects as well, when I find time.
I am a part of the Dynamic Science Labs team where we have worked on building science features for enterprise software and implementing customized solutions for our customers.
I worked with the Data Science Engineering and the Optimization team to improve the optimization of advertising campaigns by developing a diagnostic tool.
It was fun learning the basics of Linear Optimization - especially about infeasiblity in linear programs. Python was the primary programming language I used during the development phase. I enjoyed using IPython Notebooks for any supporting analysis. Ansible was one cool open-source software I got to work while automating the tool. Amazon Web Services like S3, DataPipelines, EC2 etc. and CI services like Jenkins was amongst the other technologies I learnt and enjoyed using.
I worked as student researcher under Dr.Parisa Rashidi. The project I worked on aimed to assist patients by predicting their congnitive distortions from the patients' personal text data.
I helped to develop a sentiment classifier to classify text containing the cognitive distortions into various categories such as 'black and white', 'all-or-nothing' , 'overgeneralisation' etc. Training imbalanced datasets and working with out-of-domain sentiment analysis were some of the aspects I enjoyed working.
During my grad school, I have taken courses related to Data Science such as Machine Learning, Pattern Recognition and Math for Intelligent Systems.
The latter is a math heavy course covering topics in Linear Algebra and Probability Theory. Machine Learning under Prof. Banerjee is one particular course I enjoyed learning - The fact that it was quite theoritical, I got a good grasp of the underlying difference between the different machine learning algorithms and thus was helpful while implementing them.
Working on personal projects give me the freedom to experiment new technologies and learn concepts outside the coursework. I work on a multiple hobby projects to improve my statistics, python development and machine learning chops.
The project involved the use of the statistical concept of Survival Analysis in analyzing the careers of cricketers. The scraped and cleaned player data was used to estimate the survival curves of the players. They were further compared after grouping into different player cohorts. Survival Regression models like COX-PH and Aalen's additive models were used to predict the survival curves and hazard rates of current players.
Update: I got an edited version of the survival analysis published on the yHat data science blog ☺. yHat is a tech company which provides tools and platform for industrial data science applications.
Here, I modelled the batting averages using beta distributions. The simulation of the batting averages was done using animated plots.
Built a predictive model for Golbal Terrorism Database - involved data preprocessing, feature selection and modelling.
Kaggle competitions gave me a chance to implement different machine learning algorithms like Random Forests, AdaBoost, ExtraTrees etc. on real-life datasets.
During grad school, it was great fun working on projects related to Machine Learning and Distributed Operating Systems.
A Content based Recommender System that provides the optimal product to the user was developed. Sentiment Analysis was performed on Amazon customer reviews. Naïve Bayes Classifiers, SVMs and Logistic Regressions using Rapid Miner and Python were implemented. Accuracy achieved: 0.7735.
A distributed system using Akka Actor model in Scala was developed. Bitcoins were mined using SHA-256 algorithm with the required number of leading zeros.CPU performance was measured and analyzed.
An application which simulates Twitter functionalities using Scala and AKKA actor model was built. Functionalities similar to REST APIs such as Post Tweet, Get Tweet using Spray-Can were developed. Hash tag search and Peak hour simulation were also implemented.
Python, Java, C++, Linux, Scala(Intermediate) and R(Basic)
Git, Pandas, Plotly, Scikit-learn, IPython Notebooks, Ansible, Jenkins, RapidMiner and LaTeX
DataPipelines, S3 and EC2
HTML, CSS, PHP, Javascript(Basic)
MySQL
Facilitated international exchange for students through AIESEC internships. Monitored a team which recruited students for AIESEC Internships,which included administering applications, conducting interviews with applicants and coach them into finding a suitable internship.