Joshua Ewer's Data Science Portfolio

Project Descriptions

Retail Customer Segmentation

Using supervised learning (Ridge, MLP) to forecast inventory demand, and unsupervised learning (K-means, RFM) to segment customers into meaningful groups.

For more detail, click here

Predicting Presence of Heart Disease Through Clinical Results

This project evaluates whether routinely collected clinical data can be used to predict the presence of heart disease, using a public dataset anonymous patient records. It demonstrates careful data preparation, feature selection, and ethical considerations around false negatives in healthcare.

This project conmpares a Logistic Regression baseline to a Gradient Boosting model (with significantly higher performance), showing how non-linear models can improve predictive accuracy.

For more detail, click here

Life Expectancy Analysis and Predictions

This project analyzes World Health Organization life expectancy data to identify which health, economic, and social factors are most strongly associated with differences in life expectancy across countries. Using exploratory analysis and predictive models including linear regression, Random Forest, and XGBoost, the work emphasizes interpretability, ethical use of global health data, and insights that can support evidence-based public health decision-making.

For more detail, click here

Time-series Retail Analysis

This notebook demonstrates time series forecasting workflow using real-world U.S. retail sales data. It compares SARIMA and Holt-Winters algorithms to identify long-term trends and seasonal patterns.

For more detail, click here

Childcare Costs

This project analyzes U.S. childcare cost data from 2008–2018 to show how childcare—especially infant care—has become more expensive over time and varies widely by state and county. Using clear visualizations and a data storytelling approach, it highlights how rising childcare costs create a growing financial burden for families and frames the issue as a policy problem rather than an individual one.

For more detail, click here

Optical Recognition with Tensorflow/Keras

Using keras and tensorflow, demonstrate using optical recognition to build a model that can identify a digit from a handwritten sample.

For more detail, click here

GridSearch for hyperparameter tuning

Gridsearch is a helpful method for finding the best settings (hyperparameters) for a machine learning model. This notebook demonstrates using gridsearch to find the best hyperparameters for a selection of algorithms.

For more detail, click here

Reducing Dimensionality with PCA

When you have a dataset with a high numnber of variables that overlap or are highly correlated, Principal Component Analysis (PCA) is a useful technique to reduce the number of features in a dataset, without removing useful information. This notebook demonstrates using PCA in a dataset of house sales to reduce the number of features without impacting the quality of the trained model.

For more detail, click here

Comparing Regression Algorithms

There are a large number of regression algorithms available when trying to build a predictive model. This notebook demonstrates comparing a selection of regression algorithms. It also demonstrates using R^2 and RSME to calculate the performance of a model.

For more detail, click here

Using Collaborative Filtering to Build a Recommender

Collaborative filtering is a technique used in recommender systems that makes predictions about a user’s interests by analyzing patterns of behavior or preferences from many users. This notebook demonstrates building a recommender for movies using cosine similarity based on a centered rating from all users.

For more detail, click here