Work Experience

Infor Cambridge,MA

Data Scientist • Sep, 2016 — Current

I am a part of the Dynamic Science Labs team where we have worked on building science features for enterprise software and implementing customized solutions for our customers.

  • Infor CRM Sales Intelligence
    • Designed and developed models for New Prospect Scoring, Cross Sell Scoring and Recommender System based on Transactions and Engagement Data using various Machine Learning techniques.
    • Product Adoption resulted in 2x increase in New Prospect conversion performance with more than 70% of the conversions from 20% of the leads.
    • Cross Sell Recommendations also enabled diversification of customer's business and ~6% increase in sales revenue from less popular products.
    • Led the design, development and implementation of the data pipeline which powers the RShiny Dashboard for the clients.
    • Primary collaborator with the Engineering team migrating to production.
  • Internal Customer Churn Scoring
    • Led the design and development of a ML based internal solution to identify customers at high risk of churning and provide customer specific reasons impacting the risk. Collaborated closely with the Subscription Services team to work on improving the model predictions on an iterative basis.<\li>
    • Model has been able to capture 75% customer churn, resulting in 5x uplift over existing manual churn-risk evaluation, during back-testing and identify individualized business metrics impacting churn.
    • Currently working on implementing a closed loop process to alert Subscription Services Managers of high risk churning customers.
  • Distributor Van Timeliness Prediction
    • Designed and developed a model which can identify delivery vans at a high risk of arriving late.
    • Historical Sales, GPS Data, Van Routes and schedules were used for modelling and analyze main factors causing late van deliveries.
    • 89% Late Deliveries were predicted to be late during back-testing - resulting in a very positive feedback from the client.

DataXu Boston,MA

Data Science Engineering Co-op • Jun, 2015 — Dec 2015

I worked with the Data Science Engineering and the Optimization team to improve the optimization of advertising campaigns by developing a diagnostic tool.

  • Researched and successfully implemented IIS (Isolated Irreducible Set) algorithms to efficiently diagnose infeasible linear optimization problems.
  • Strategized and developed campaign-specific algorithms to identify the infeasible constraints in the optimization problems.
  • Automated the diagnosis and designed a dashboard to display the incorrect metrics causing infeasibility for each campaign.

It was fun learning the basics of Linear Optimization - especially about infeasiblity in linear programs. Python was the primary programming language I used during the development phase. I enjoyed using IPython Notebooks for any supporting analysis. Ansible was one cool open-source software I got to work while automating the tool. Amazon Web Services like S3, DataPipelines, EC2 etc. and CI services like Jenkins was amongst the other technologies I learnt and enjoyed using.

intelligent Health Lab (i-Heal), University of Florida

Research Volunteer • Feb, 2015 — Apr, 2015

I worked as student researcher under Dr.Parisa Rashidi. The project I worked on aimed to assist patients by predicting their congnitive distortions from the patients' personal text data.
I helped to develop a sentiment classifier to classify text containing the cognitive distortions into various categories such as 'black and white', 'all-or-nothing' , 'overgeneralisation' etc. Training imbalanced datasets and working with out-of-domain sentiment analysis were some of the aspects I enjoyed working.

Education

University of Florida

Masters in Computer Science • Aug 2014 — present

During my grad school, I have taken courses related to Data Science such as Machine Learning, Pattern Recognition and Math for Intelligent Systems.
The latter is a math heavy course covering topics in Linear Algebra and Probability Theory. Machine Learning under Prof. Banerjee is one particular course I enjoyed learning - The fact that it was quite theoritical, I got a good grasp of the underlying difference between the different machine learning algorithms and thus was helpful while implementing them.

Personal Projects

Working on personal projects give me the freedom to experiment new technologies and learn concepts outside the coursework. I work on a multiple hobby projects to improve my statistics, python development and machine learning chops.

Application of Survival Analysis in career lifetimes of 1725 cricketers and predicting the hazard rate of current players.

The project involved the use of the statistical concept of Survival Analysis in analyzing the careers of cricketers. The scraped and cleaned player data was used to estimate the survival curves of the players. They were further compared after grouping into different player cohorts. Survival Regression models like COX-PH and Aalen's additive models were used to predict the survival curves and hazard rates of current players.

Update: I got an edited version of the survival analysis published on the yHat data science blog ☺. yHat is a tech company which provides tools and platform for industrial data science applications.

survival_analysis

Here, I modelled the batting averages using beta distributions. The simulation of the batting averages was done using animated plots.

anim_beta

Built a predictive model for Golbal Terrorism Database - involved data preprocessing, feature selection and modelling.

gtd_analysis

Kaggle competitions gave me a chance to implement different machine learning algorithms like Random Forests, AdaBoost, ExtraTrees etc. on real-life datasets.

Academic Projects

During grad school, it was great fun working on projects related to Machine Learning and Distributed Operating Systems.

A Content based Recommender System that provides the optimal product to the user was developed. Sentiment Analysis was performed on Amazon customer reviews. Naïve Bayes Classifiers, SVMs and Logistic Regressions using Rapid Miner and Python were implemented. Accuracy achieved: 0.7735.

A distributed system using Akka Actor model in Scala was developed. Bitcoins were mined using SHA-256 algorithm with the required number of leading zeros.CPU performance was measured and analyzed.

Skills

Programming Skills

Python, Java, C++, Linux, Scala(Intermediate) and R(Basic)

Tools and Technologies

Git, Pandas, Plotly, Scikit-learn, IPython Notebooks, Ansible, Jenkins, RapidMiner and LaTeX

Amazon Web Services

DataPipelines, S3 and EC2

Web Programming

HTML, CSS, PHP, Javascript(Basic)

Databases

MySQL

Associations

AIESEC

Outgoing Exchange(OGX) Team Leader• 2011 — 2013

Facilitated international exchange for students through AIESEC internships. Monitored a team which recruited students for AIESEC Internships,which included administering applications, conducting interviews with applicants and coach them into finding a suitable internship.

Influences and Favourite Quotes