How a machine learning engineer thinks (before ChatGPT)
Part three of a four-part series. Before we get to how a Product Manager thinks about using machine learning in products, we need to understand how a Machine Learning engineer thinks.
When they say machine learning, they mean creating a model of something. Not a physical model like a toy car. But model of how something behaves. For example, we all have a model that when the sun sets, next it must come up and we get light.
A machine learning engineer creates, aka trains, a model based on the patterns and relationships it finds in data you give it. Give it data as an input, and it predicts you an output. Give a model a time of day, and it’ll tell you if the sun is up or down.
Before you can create a model of your information, remember to
Do steps 1-4 in How a data scientist thinks
- Understand your situation, challenge, and outcome if you solve it
- Prepare related information (aka data) to it
- Understand your information — how it’s spread out, where it’s concentrated, potential issues
- Iteratively fix the bad and vague information
- Now you’re ready to start the modelling part “Answer your Questions through analyzing and modelling your information”
Steps involved to create a model of your information (aka feature engineering)
Part Art, the Product Manager with the Data Scientist
- Choose the most relevant information to what you’re trying to predict
Part Art & Science
- Leverage existing or create new model for each new question you want to answer
- Select how you’ll model it
The structure and process your information will flow through
- Split your information into three (for training, validating, and testing)
You’ll check your model each time you train it, including the final one, against information it hasn’t seen yet
- Train your model (using the gradient descent technique) on your training information
- Measure how close it predicts what you expect using your validation information
- weak what you chose in 1 and 2 if its predictions are not good enough and retrain your model in step 5
- You have your final model, measure how close it predicts what you expect using your testing information
You’ll again tweak what you chose in 1 and 2 if its predictions are not good enough, retrain your model in step 5
- Deploy your model
Here’s how you do each step
Part Art — the Product Manager with the Data Scientist
-
Choose the most relevant information to what you’re trying to predict
- Assess which information (e.g. time, price, location — aka features)
- has most influence on what you’re trying to predict
- helps predict highly different outcomes (i.e. cat, not cat)
- Start with more features
- Include enough that a human expert in their domain (i.e. finance) can confidently predict something when given only X
- Assess which information (e.g. time, price, location — aka features)
Part Art & Science
-
Leverage existing or create new model for each question you want to answer
- Before building a new model from scratch, check if an existing model can be adapted to the question you’re trying to answer
- Considering using transfer learning or model adaptation techniques
-
Select how you’ll model it
- Choose your training model architecture (i.e. CNN, RNN) i.e. how many huge matrices your training will run through
- Choose your loss function algorithm i.e. it calculates how close your expected output is to what it predicted
-
Split your information into three (for training, validating, and testing)
- One for training the model, say 60%
- One for validating the model’s predictions on each training run, say 20%
- Tune your model’s parameters and selecting the best performing one
- Why not train on this? This tells you how your model generalizes to information it hasn’t seen yet, avoiding memorizing your training information instead of learning generalizable patterns
- One for testing your final model’s predictions
-
Train your model (using the gradient descent technique) on your training information
-
Measure how close it predicts what you expect using your validation information
- Manually examine the errors on examples in your validation information, and try to spot a trend where most of the errors were made
- Get a single, numerical value to how close your model gives what you expect with data it hasn’t seen
- Use statistical techniques like: accuracy, precision, recall, F1 score, etc
-
Tweak what you chose in 1 and 2 if its predictions are not good enough and retrain your model in step 5
- Reduce # of features (price, size, location, etc.)
- Adjust:
- Learning rate: think of it as a tuning knob. A higher learning rate can be faster but also unstable, prone to being bad at predicting on information it hasn’t seen yet (i.e. overfitting). While a lower learning rate is more stable, it can be too simple to learn anything useful (i.e. underfitting)
- Number of training runs (aka epochs): is the number of complete passes through the training information. Increasing runs allows your model to learn more complex patterns but can also lead to overfitting
- Regularize your information: it helps prevent memorizing the training information and encourages learning patterns that generalize to new data
- Loss function algorithm: start with a simple one, implement it quickly
- Go back to step 5
-
You have your final model, measure how close it predicts what you expect using your testing information
- Go back to step 6 if its predictions are not good enough
-
Deploy model
Credits
- My learning from: Coursera Machine Learning, Coursera Machine Learning Foundations for Product Managers, Situation Challenge Questions Answers (SCQA framework)
- My editors fixing gaps in my understanding: ChatGPT, Gemini