Andrew Alderfer's Portfolio

all
art
design
furniture
prototyping

Design Concepts

Renderings of Various Furniture and Gadget Ideas

Posted on Mar. 25, 2024

all
design
fabrication
project management
prototyping

Product Design and Prototyping

Prototyping and Build details for a Water Fountain Company

Posted on Jan. 29, 2024

I helped design, prototype and iterate on a product being developed by a startup water fountain company.

This project allowed me to exercise my whole range of fabrication/images experience and skills. This included CNC work, CAD, Carpentry and Project Management.

all
design
fabrication
furniture

Walnut Dinning Table

10' Dinning Table from Walnut Slats

Posted on Mar. 5, 2023

I helped re-engineer this table from a previous design. This included designing a new connection point for the legs and rethinking the CNC and slab gluing operations.

all
design
fabrication
furniture

Oak Banquette

An Oak Banquette

Posted on Mar. 5, 2023

I worked out the construction and installation of this banquette from the designers 3d model.

This included the interenal structure, part sourcing and order of operations from CNC milling to getting the perfect natural finish.

all
design
fabrication

Fixturing

A Fixture I Designed for Highly Repeatable Manufacturing

Posted on Mar. 5, 2023

all
workshop
personal

Workshop Shelving

Just Some Fun Shelving for My Workshop

Posted on Aug. 28, 2023

A weekend project to design and build a shelf with planks left-over from a previous project.

This project also gave me the oportunity to prototype a design for a larger set of shelves I am going to build for our household storage area.

all
design
prototyping
3d printing

Sleeve Bearing

A Bearing for the Stop on a Horizontal Bandsaw

Posted on Mar. 28, 2024

all
design
prototyping
3d printing

Tape Cutter

A Tool for Cutting Magnetic Tape

Posted on Mar. 28, 2024

all
design
fabrication
furniture
personal

Oak Bed Side Table

A Simple Tripod Side Table with Sliding Storage Shelf

Posted on Mar. 5, 2023

A weekend project to make myself I nice bedside table from some oak flooring scraps.

It has a shelf that pulls out, 3d printed brackets that angle the hair-pin legs in a nice way, and I'm really happy with the finish. Just oil, and I think I probably put a clear poly over it at the end.

all
data science
python
machine learning
machine vision

Object Detection

Learning about U-nets and Object Detection with TensorFlow and Darknet

Posted on Jan. 14 2023

At the time that I tried to carry out this project it was a huge struggle. I learned a ton about the various object detection models, writing custom models, layers and loss functions with TensorFlow and Keras, and also that it's sometimes better to use an off the shelf model than try to code it all yourself!

The goal of this project is to use a dataset of photographs to train a model to detect and identify an assorment of screws and nuts. The dataset was taken from the MVTec Screws Dataset. Object detection is a challenging problem in machinelearning and continues to be a very active area of research with new algorithms and processes being developed regularly.

The Dataset for this project consists of several hundred images of around 4500 thousand pieces of construction hardware belonging to 13 different classes. The mostly represent types of common screws however the images also contain nuts of varying sizes. Each size and type of hardware belongs to its own class.

Exploratory Data Analysis

The data is structured in a format similar to the COCO dataset standard. It is derived from a proprietary format belonging to the dataset originator, MVTec. The images are very standardized each depicting a different configuration of randomly assorted pieces of hardware. There is broadly no major class imbalance however individual images can contain anywhere from 10 to 20 parts of 5 to 10 classes.

Data Cleaning/Preparation

The data presented several challenges when preparing it for training. The bounding-box format was similar to the COCO format but did not seem to match what most plug and play tool chains expected. This meant reworking the data multiple times and in a multitude of ways.

Data manipulation led to the writing of several custom tools to handle various aspects of preparing the data. These included:

Tools to create coordinate points for bounding box corners from the row, col, width, height, angle format in the data.
Tools to create compatible label formats for training.
Tools to calculate IoU and GIoU for unaligned bounding boxes.
Tools to provide necessary calculations from the label data to the loss function of a YOLO-like model.

Final Model

For my final object detection attempt I used a YOLO implmentation written with Darknet. Darknet is a NN framework written in C and provides a python api and many versions of models with trained weights. I am using the YOLOv3 implementation in darknet and trained it using a set of weights from COCO and my own data.

It was relatively easy to set up and get training. There were small adjustments I need to make to my dataset and simple scripts to translate my labels to a format darknet will accept. In order to use this framework I removed the angle parameter from the labels and recalculated the points to match what darknet expects.

Conclusion

Although this project has been difficutl to get into a strong place performance-wise I am extremely satisfied with the progress and expreince I have gotten while working towards its completion. There is still a little ways to go, but I am excited to bring the knowledge and skills I have learned thus far to tackling the final stretch. The idosyncracies in the dataset have comlicated most of my attempts at finding the solution with the straightest path to completion, but thanks to that process I feel that I have gained real experience and knowledge in an area of machine-learning that I am deeply interested in pursuing further.

all
data science
python
machine learning
machine vision

Anomaly Detection

Exploring Anomaly Detection with Autoencoders

Posted on Nov. 20, 2022

This was my first dive in the world of Autoencoders. The model I ended up with used a attempted to detect manufacturing defects in phots of screws with a Encoder/Decoder network.

I didn't really know what I was doing, but this experience would be very helpful when I moved on to object detections models that use deep u-nets like YOLO.

Introduction

This is the github repo for my project on Anomaly Detection. I used a dataset of some ~500 images of screws to train a neural network and identify anomalous (damaged) screws in the test set.

Business Understanding

The purpose of this project is to look for ways to improve or automate quality control procedures in industrial manufacturing. The idea is to train a model to identify damaged or faulty products, in this case screws, and remove them from the production line.

Data Understanding

The dataset for this project was graciously provided by MVTec as part of their Anomaly Detection image set.

Data

The data is split into a training and test set, and sub categories within the test set representing the anomalous category of the image. This made it fairly easy to load up the data into a notebook and get going with various autoencoding strategies.

Modeling

A visualization of the output from my first model is pictured below:

The first model was a simple autoencoder with a series of Dense layers that compressed the data from each image into a relatively tiny tensor and then attempted to reconstruct the original image from that compressed version. As you can see from the image above the model had difficulty abstracting the position of the screw in each image so the output images are blurry and distorted around the center point of the screws.

Final Model

The final model ended up being a multilayered convolutional network. The layers did a better job of abstracting the orientation of the screw and its structure in the image.

With this model I was able to achieve an average accuracy rating of 86% when it came to classifing anomalous screws. As you can see from the graph above once the model is trained I used the test set to measure the mean-square-error between the models inputs and outputs. This resulted in two loss distributions. A non-anomalous (normal) loss distribution and a loss distribution for anomalous input data.

Using these distributions I calculated a classificatoin threshold and then programmed a function to return the accuaracy rating of that threshold.

Moving Forward

I am continuing to investigate ways of improving this process's accuracy and consistency. I am going to try different loss functions and see if applying different preprocessing methods like sobel filtering or other simple convolutional functions could improve performance, and focus the model on specific features.

Citations:

Markus Ulrich, Patrick Follmann, Jan-Hendrik Neudeck: A comparison of shape-based matching with deep-learning-based object detection; in: Technisches Messen, 2019, DOI: 10.1515/teme-2019-0076.

all
data science
python
machine learning
statistics

Hydraulic Pump Monitoring

Using Sensor Data to Infer Pump Maintenance Scheduling

Posted on Oct. 28, 2022

This was a really fun project to work on and involved time-series data collected from sensors on a hydraulic pump rig. I got experience writing algorithms to programmatically tune model parameters, evaluate various methods of analyzing the time-series, and formatting outputs for human readability.

Data Understanding

This data comes from a set of sensor measurements taken during 2205 sixty second cycles of a hydraulic pump testing rig. During the testing the pump's maintenance status was recorded. These various metrics of the test rigs physical condition will be the target variable for our tests. The sensor data will be the predictors.

The goal will be to use sensor data (such as temperature, tank pressure, vibration magnitude, etc.) to predict the state of the hydraulic pump.

The data is split between sensors. Each sensor has a specific sample rate qhich cooresponds to the columns in its table. So TS1.txt contains temprature readings from one sensor. Its sample rate was 1hz for each 60 second pump cycle. Therefore, in the TS1.txt file there are 60 columns and 2205 rows of data.

Each row represents one full cycle and each column represents one sample (in this case 1 second) of readings from the temperatue sensor. To create features from this data we will need to come up with methods for aggregating each row of the sensor data into a single column of data.

Modeling

As previously mentioned, the data set includes five target variables – cooler condition, valve condition, internal pump leakage, hydraulic accumulator (hydraulic pressure), and stable flag (stable condition). We determined that each of these target variables were vital to the stakeholder and will likely impact our final recommendation. As a result, several models were created. About two models were created for each target variable. Depending on the variable, utilized, certain features were utilized including simple averages of the 60-second cycle, the average change over the course of the cycle, the average and change every 20-seconds of the cycle, and standard deviation of both the full 60-second cycle and every 20-seconds. To begin, we will utilize a simple logistic regression model. Given the data and the stakeholder’s business problem, it will make most sense to run a grid search on several different model types to determine which produces the highest accuracy.

First Model:

Feature, Target Variable: Simple Average, Valve Condition

For our first model, we are utilizing the grid search to evaluate the Valve Condition as our target variable and utilizing the average metrics of each cycle (simple average) as our feature. To begin, we will evaluate five different models – a logistic regression model, a decision tree model, a random forest model, a K-nearest neighbors (KNN) model, a support vector machine model, and an XGBoost model. We will run a grid search for each of these models to evaluate the hyperparameters that will produce the highest accuracy scores. As a reminder, Valve Condition, measured as a percentage, includes four classifications – 100 meaning the pump was functioning at optimal switching behavior, 90 meaning there was a small lag, 80 meaning there was a severe lag, and 73 meaning the pump was close to total failure.

Evaluation

After running all of the above models and inspecting their output we determined that XGBoost was the best model to iterate one more time. We were also able to determine which feature/target pairings resulted in the best predictions.

Below we set up test to see which of the sensors had the best feature importance on average. A series of functions in helpers.py were chained along with some search and result parsing to allow us to extract the relevant statistics.

Top 8 Sensors

Below are the 8 top performing sensors in predicting the state of the hydraulic pump test rig.

Final Results

Below we have the metrics from our final models.

XGBoost

We decided to go with the XGBoost model for our final iteration and Average of cycle thirds as the feature set for each target variable except Cooler Condition. For Cooler Condition we decided to use the Standard Deviation for Cycle 3rds as the feature. This was due to its consistently high score along all our metric axes.

Conclusion:

Recommendation:

Considering all of the above analysis we would recommend the stakeholder utilize an XGBoost predictive model. According to the numerous models and iterations we ran, the best, most accurate model the stakeholder should utilize is an XGBoost model. Further, to effectively utilize this model, we would recommend utilizing the model to predict a pump’s cooler condition and internal pump leakage. Based on our analysis, these predictive models generated the highest accuracy scores (99%+). While the accuracy score of these models are high, there are reasons the model may not fully solve the business problem. The data we utilized was ultimately collected from a single test rig, meaning the environment in which this test rig was producing the data analyzed was carefully selected by the test coordinators. Therefore, there could have been situations that caused leaks or other faults with the pumps that were not accounted for, such as human error or other extreme situations.

Next Steps:

Further criteria and analyses could yield additional insights to further inform the stakeholder by:

Reviewing other testing data. The stakeholder should consider utilizing a data set in addition to the one that was analyzed. As previously mentioned, although the data set included 2200+ records of testing data, this data was collected from a single test rig. Utilizing data from another test rig could be helpful with re-checking the accuracy of our final model and noting if our findings were consistent.

Collecting real-word data. Another factor the stakeholder should consider is collecting real-world data. It is known that the stakeholder uses specific water pumps with their irrigation systems. As such, the stakeholder should consider setting up a system to collect daily data similar to that of the data set utilized. By doing so, the stakeholder could utilize the final model with the data processed through their irrigation system.

all
data science
python
statistics

Census Data and Real Estate Analysis

King County, WA Development Project

Posted on Oct. 10 2022

The King County Parcel Map

My second data science project where I joined a real estate dataset with data from the census to analyze investment opportunities in the Seattle area.

Stretching the project to include census data allowed me to get experience building workflows that incorporate multiple datasets with differing formats and features.

Business Understanding

The client is an NGO based in the Seattle area working in community outreach and development. They are interested in gaining deeper insight into the communities in and around King County with the hopes of better focusing investment for maximum impact.

The NGO board has identified Life Expectancy as a primary metric to locate and understands areas around the county that require investment. Further, they are interested in local property markets as an indicator of communities general economic health and as a indicator for the effect size they can expect their investments to have on those communities.

The Data

To investigate this topic I used a primary dataset collected by the King County Assessors office, and one secondary dataset compiled as part of the Land Conservation Intiative (LCI) opportunity are analysis.

The Kings County Property Sales data was collected from 2014 through 2015.

The LCI dataset is a combination of several datasets from Public Health, the American Communities Survey, and localization data as part of King County's Open Data program. The date range on the data combined in this set are 2014 - 2019.

King County Tree Cover Percentage

Breaking Things Down

The first thing I want to do is identify data types and start to parse out how things are related. in This section I will discuss the ways that I appraoched this. The parts that worked and the parts that didn't.

I used pairplots and commonsense to identify possible colinearities between the data columns.
I wrote a series of functions that helped reorganize and rationalize the data more toward my use cases.

i.e. I wrote a latitude and longitude binning function that let me merge the two data sets of of different sizes and types of data into a single dataframe.
I wrote a function that cleaned and expanded my categorical data from the LCI set so that when it was grouped I was able to maintain the record of that data in a column that could then be expanded after the merge. Essentially, I think this may have been similar to a column expansion operation.

I wrote a lot of boiler plate code and am refactoring it into reusable blocks.

I wrote a helper function for visualizing a large dataframe. It takes the dataframe as an argument and a list containing the column names of the columns you want to graph from it. Then it dynamically creates a 2-d array of axes objects automatically determining the best size grid to fit them.

I want to expand this into a more general solution. One that can just be given any dataframe and determine based on the datatypes in each column the best way to return that column as a graph.
It seems like it would be fun and useful to have a tool similar to pd.df.info() or describe() but for graphing.

Dealing with Outliers

Below is a little bit of the process I used to take care of outliers. it resulted in a much more normal distribution for most of the numerical categories. I wasted most of today investigating the the sqft_lot and sqft_lot15 columns and why they are retaining so much skew after normalization.

I want to figure out how to take something like a derivative of the distribution to test how the features are actually structured. Made some progress toward that but it is now 2am.

See more at my github repo!

all
statistics
data science
python

IMDB Dataset

Movie Data Analysis

Posted on Sept. 20 2022

A simple project with a well known public dataset to get my feet wet in data science. I used python/pandas for the exploratory step and statistical analysis.

This project was a self directed EDA and data processing project using the Python Pandas library and a collection of datasets from IMDB (the Internet Movie Database) and a few other similar sources. The Intention of the project is to discover, analyze and use patterns and relationships found in the provided datasets to make business recommendations according to the following prompt.

Business Understanding

The client, Microsoft, has decided to explore getting involved in the entertainment industry. Specifically, they are interested in making movies. I have been task with examining the provided data and making three recommendations for how to successfully enter the entertainment industry as a large, established company currently operating in a different industry.

Some questions I attempted to answer with this dataset were: * What types of movies are succesful in today's market * What patterns and trends can be gleaned from the dataset * Given the observable business environment, what practical advice would I provide a client about entering the industry

Data Understanding

The source for the data used was a publicly licensed sql database provided by the website IMDB. The database contained 8 tables: table names

I started off the EDA process by pulling these table out of the sql database and storing them in Pandas DataFrame objects oragnized within a python dict object. I was then able to start observing the structure of and relationships between the tables and begin the process of analyzing there meaning.

Visualizations

I started by visualizing the total box office grosses for each of the top genres.

Total Box Office by Genre

Then I decided to compare the median production budget by each of these genres.

Median Production Budgets

and the distribution within each genre of ratings vs budget:

Ratings per Million Spent

After identifying that Action and Drama commanded the largest market share among the various genres, I wanted to compare the distribution of the that market share within each repsectively.

Action Genre Market Distribution

Drama Genre Market Distribution

all
design
fabrication
personal

Wedding Arch

I Made the Arch for My Wedding

Posted on Jul. 28, 2022

Hello, this is my wedding arch.

This is the arch for my wedding. I wanted it to be simple but beautiful.

The arch halves slot together and are fastened at the top. Getting the curves just right and setting up a fixturing method for this thing was a fun challenge.

all
project management
fabrication
design
prototyping

Restaurant Chain Builds

Examples of Fabrication/images Projects for a Chain Restaurant

Posted on Jul. 10, 2022

This was a long term project that let me help develop and iterate on a manufacturing process.

Some of the pieces we made were one offs (the reception desk), or built for a specific space (the steel-perf wall), but we made several hundred of the rolling racks.

I was involved in figuring out and executing every step of the process: protoyping, designing permament fixtures, welding, finishing, CNC, delivery and installations.

all
fabrication
design
personal

Bubble Shelf

A Pair of Cute Rounded Shelves

Posted on Nov. 11, 2021

I made this rounded shelves for my wifes plants to sit on. The goal was to not have any hard edges on anywhere.

Round things are cute, and I think these turned out pretty well.

all
fabrication
design
personal
workshop

Workshop Cart

Small Movable Cart for My 3d Printer and 3d Printing Related Supplies

Posted on Sept. 29, 2021

I built this cart to hold my 3d printers. It was designed to fit in a very small space when I lived in a much smaller apartment.

It gave me an opportunity to dry out that style of drawer pull, and to work with mixed materials. The bottom drawers hold printing filament and tools specifc to my printers

all
fabrication
design
personal

Oak and Steel Bench

Bench for Shoes Near a Front Door

Posted on Feb. 7, 2021

An idea I had for a slatted bench. The base was made out of steel bar and angle.

It was fun to make the bending jigs and forms to get the rounded edges of the shoe rack. It still looks great after several years of sitting by our front door.

Exploratory Data Analysis

Data Cleaning/Preparation

Final Model

Conclusion

Introduction

Business Understanding

Data Understanding

Data

Modeling

Final Model

Moving Forward

Citations:

Data Understanding

Modeling

First Model:

Feature, Target Variable: Simple Average, Valve Condition

Evaluation

Top 8 Sensors

Final Results

XGBoost

Conclusion:

Recommendation:

Next Steps:

Business Understanding

The Data

Breaking Things Down

Dealing with Outliers

See more at my github repo!

Business Understanding

Data Understanding

Visualizations

Total Box Office by Genre

Median Production Budgets

Ratings per Million Spent

Action Genre Market Distribution

Drama Genre Market Distribution

Reception Counter

Installations