During the beginning of the pandemic I engaged in multiple Udacity Courses
This is a summary of the AI for Healthcare course and what I learned from it.
AI for Healthcare
In this course, I learned how to utilize machine learning techniques for health care
data.
The first project was a CNN that used chest X-rays to diagnose pneumonia.
For this project,
I learned alot about how machine learning can be applied for health care purposes. This
project covered the types of error, the steps involved in getting a model approved by the FDA,
and how to properly handle sensitive medical data. The actual model generated was not
particularly powerful, however with some continued work it could be made very robust.
A visualization of the model’s performance
The next project I undertook was using MRI scans for Hippocampus Segmentation. This was my
first experience utilizing three dimensional data or MRI scans in general. The UNet-based
model was coded using Pytorch, a library I had very little familiarity with. The whole
project was very new to me and I learned a significant amount. In the future, I want to
do some more experimentation with the UNet architecture and explore what problems I can
apply it to.
A visualization of the segmentation model’s training performance
The next project was to build a regression model that predicted expected hospitalization
time for a patient being given an experimental diabetes drug. This was then used to decide whether
or not they should be included in the clinical trials. The data set was a synthetic and the
real challenge was the preprocessing and analysis of the dataset rather than the model building.
It was very challenging working with a medical dataset which required a considerable amount
of preprocessing, feature engineering, and filtering. I also learned how to deal with model
bias for demographic groups and how to mitigate these. This project was very informative on
how to work with medical data and how to analyze a model based on that data.
A visualization of the model’s biases
This model shows significant bias towards Caucasians and African-Americans. There is also
a bias towards women over men.
The final project was utilizing data, in this case heart rate, from wearables. This project used
data I was very inexperienced with handling. Preprocessing and analyzing the data was very
difficult. I was able to complete the project and I learned a lot from the experience.
Overall, this course introduced me to many new data types and introduced me to the
possible implementations of machine learning technology in the health care industry. I
also learned how to handle sensitive health care data, the impact of model bias based on
demographics, the intricacies of the FDA approval process, and the types of errors and metrics
to consider in a health care setting.
During my winter break, I took on the Cassava Leaf Detection competition on Kaggle,
which I thought would be a quick project that would keep me occupied. However, I found out
I was greatly mistaken. What I thought would be a quick few week project ended up taking me
almost three months. I started the project at the start of my winter break, December 23, and
my last submission was made on the day that the competition ended, February 11.
Initially, I had hoped I could make a simple model based off the ResNet architecture because I have had
success with that before. These hopes were crushed when I found I could get no more than 70%
accuracy. I then decided I needed a more complex model and decided to use transfer learning.
I decided to use the pre-trained ResNet50 model. To my surprise, This performed worse than my initial
model. I then tried to fine tune my initial model and was able to hit 75% before overfitting.
I then tried both the pre-trained Inception and MobileNet architectures. The Inception model was
only able to hit 70% without any fine-tuning and I quickly abandoned it. The MobileNet architecture
was able to hit 80% with a lot of fine-tuning but I decided to drop it because I did not think
I could get any more out of it. So, late in to the competition, I was scrambling to try and
find an architecture that could work.
I finally stumbled onto the pre-trained EfficientNet model.
My first test I was able to hit 83% before overfitting. The only problem was, for a reason I have yet
to discern, the model would sometimes decide not to learn and a training session would be wasted, also
wasting my precious, limited GPU hours. It was the final week of the competition and, furiously rushing,
I was able to tune the model to get 86%. With my GPU hours gone and the competition coming to
a close I was able to submit my final model with a score of 0.8611. I got in
2854th out of 3900 teams, the top team getting a score of 0.9132.
A visualization of the model
This was my first Kaggle competition
and I was able to learn a lot. If I were able to redo this, I would not have spent as much time
fine-tuning the MobileNet model as most of it didn’t lead anywhere. Overall, I am happy with
this project and I think I learned quite a bit.
I meant to get this post out during February or March but I was still revamping the website
and I didn’t finish until March 19th which was the end of my Spring Break. With the end of
the school year I wasn’t able to work on any personal projects and I was only able to
work on this post at the start of summer. I also have a recap of the extended summer, due to
quarantine, of 2020 that I need to post and I hope to post that soon.
I have updated the website again to make it easier to navigate and look better. I
returned to Jekyll after switching to Hugo and used the Type-on-Strap theme to revamp the website.
In my opinion, I find that Jekyll has better looking themes than Hugo.
However, Hugo is a lot simpler and it took longer to set up the Jekyll site.
Both use markdown to edit posts so it was fairly simple to move the old posts to the new site.
The most time consuming aspect was changing how the images were imported because of the different
folder structures.
I did have some trouble with the audio embed from the previous post and had to rework how
I embedded that.
I also took the opportunity to improved some of the previous posts and redesigned my logo.
I also had the time to revamp the portfolio section and create some quick designs for each of my projects.
In the future I want to expand on the descriptions provided, however, I don’t know how to
do this without making the blog section redundant. With the full website redesign, I took
the time to create a simple graphic for the front page using krita and establish a color
palete for the rest of the website. This may be subject to change.
I hope this is the last time I have to completely redesign the website, however, I do plan
on refining some elements.
I took on a small project this year. I wanted to work more on neural networks so I attempted to create one that generated music.
I used a dataset of midi versions of mozart’s music.
In order to create the algorithm I utilized this tutorial.
This tutorial helped me understand how to work with midi data, which I have never worked with before.
I used a modified version of the WaveNet Architecture for my model. One problem I ran into, was the model tended to return a long run of the same note most of the time.
So in the end, it would, only half the time, generate a good song. In the future, I want to make the model more complex in the future and train it with more data.
Here is one of the better pieces of generated music:
thumbnail and header photo by Scott Kelley from Unsplash
I have not been good about updating my blog. This year I want to try to be better at this.
Throughout the year, I have been taking on small projects. During the school year,
I attempted to complete two courses. These courses were by Brandon Rohrer, whose courses you can find here.
The first one was a decision tree which predicted arrival times of the subway in Boston.
An early version of the decision tree
I achieved very good results in the end with a pretty accurate model. I ran into
a problem because I live in the central time zone and the data was in eastern time
the model didn’t learn properly that is the decision tree pictured here.
The second course I took was a polynomial classifier that, when given the name of a dog breed, gives similarly sized dog breeds.
A visualization of the classifier