Overview
This academic year I have been part of a machine learning labs at Tufts under professor Michael Hughes. Our team of three has been developing a machine learning suite for medical image classification using semi-supervised and self- supervised ML techniques. The goal of this investigation is to see how well we can train models to generalize across different hospitals. Specifically, if we have large amounts of data from a specific site (or location), and we train our models on that data, how well can we expect performance to be when we try to make predictions on a population from a different site.
Working on this project over the year, and contrasting the experience with my time at Stripe, my biggest takeaway was a difference in processes and thus, speed. At Stripe, we had clear frameworks for how and when we want certain deliverables to be finished and a vision for the coming week, two weeks, and so on. However, I think the lack of external structure made me more motivated to create that for myself and my teammates. I began to take charge of creating a timeline and helping set tasks and goals for us to be able to achieve the results we were aiming for.
We now have a promising pipeline set up for everything from preprocessing to training and evaluation and hope to be able to draft a paper soon.
Code
If you're curious, the current state of our work can be found here: SSL-vs-SSL Benchmark Github Repo.