Experienced Data Analyst/Scientist with over a decade of comprehensive experience in data science, software engineering, and project management. Proficient in transforming complex business problems into manageable, actionable solutions. Adept at developing strong client relationships, managing stakeholders, and executing successful change management. Passionate about using data-driven insights to drive business outcomes and solve challenging problems.
Data Science Projects
>> Machine Learning Model Comparison
Developed and executed five advanced machine learning models (Logistic, KNN, Random Forest, Linear SVC, SVM) using scikit-learn to classify Blue Tarps and accurately locate earthquake survivors. Automated the process to determine the optimal F1-score threshold based on precision-recall trade-off analysis. Conducted extensive hyperparameter tuning and cross-validation through the use of pipelines and grid search, resulting in optimal model performance. Evaluated multiple classification metrics, including accuracy, log-loss, precision, recall, and runtime, to select the most effective algorithm.
>> Python Network Analysis
Collaborated on a NSF grant aimed at evaluating the impact of open source software. Utilized network-based statistics, including centrality measures like degree and eigenvector centrality, to identify key players such as packages, developers, and countries. Conducted a comprehensive analysis of software development cost and its correlation with the impact of Python projects, providing valuable framework to conduct simialr analyses.
>>
Predicting Education Level Using US Census Data
Developed a Random Forest machine learning model in PySpark to accurately predict education level of residents based on household characteristics. Utilized Principal Component Analysis (PCA) to effectively reduce dimensions and identify the most significant predictors. Optimized model performance by incorporating graph-based cross-validation methods in PySpark, resulting in efficient runtime.
>>Course Recommendation System (Using Neural Collaborative Filtering–not public)
Designed an advanced deep learning Neural Collaborative Filtering recommendation model to assist administrators in recommending relevant courses as electives to students, providing personalized educational recommendations and improving student success.
Tools and Certifications:
>> Python (Pandas, Scikit-Learn, TensorFlow, NLP), PySpark, SQL, Power BI, Qlik Sense>> Certifications: Equity-minded evidence-based decision making, Management Essentials for a Developing Leader, Project Management