rigélblu

me | writing
tom.hosiawa
Mar 06, 2024

How a machine learning engineer thinks (before ChatGPT)

Tom Hosiawa • 4 min read

Part three of a four-part series. Before we get to how a Product Manager thinks about using machine learning in products, we need to understand how a Machine Learning engineer thinks.

When they say machine learning, they mean creating a model of something. Not a physical model like a toy car. But model of how something behaves. For example, we all have a model that when the sun sets, next it must come up and we get light.

A machine learning engineer creates, aka trains, a model based on the patterns and relationships it finds in data you give it. Give it data as an input, and it predicts you an output. Give a model a time of day, and it’ll tell you if the sun is up or down.

Before you can create a model of your information, remember to

Do steps 1-4 in How a data scientist thinks

  1. Understand your situation, challenge, and outcome if you solve it
  2. Prepare related information (aka data) to it
  3. Understand your information — how it’s spread out, where it’s concentrated, potential issues
  4. Iteratively fix the bad and vague information
  5. Now you’re ready to start the modelling part “Answer your Questions through analyzing and modelling your information”

Steps involved to create a model of your information (aka feature engineering)

Part Art, the Product Manager with the Data Scientist

  1. Choose the most relevant information to what you’re trying to predict

Part Art & Science

  1. Leverage existing or create new model for each new question you want to answer
  2. Select how you’ll model it

    The structure and process your information will flow through

  3. Split your information into three (for training, validating, and testing)

    You’ll check your model each time you train it, including the final one, against information it hasn’t seen yet

  4. Train your model (using the gradient descent technique) on your training information
  5. Measure how close it predicts what you expect using your validation information
  6. weak what you chose in 1 and 2 if its predictions are not good enough and retrain your model in step 5
  7. You have your final model, measure how close it predicts what you expect using your testing information

    You’ll again tweak what you chose in 1 and 2 if its predictions are not good enough, retrain your model in step 5

  8. Deploy your model

Here’s how you do each step

Part Art — the Product Manager with the Data Scientist

  1. Choose the most relevant information to what you’re trying to predict

    • Assess which information (e.g. time, price, location — aka features)
      • has most influence on what you’re trying to predict
      • helps predict highly different outcomes (i.e. cat, not cat)
    • Start with more features
    • Include enough that a human expert in their domain (i.e. finance) can confidently predict something when given only X

Part Art & Science

  1. Leverage existing or create new model for each question you want to answer

    • Before building a new model from scratch, check if an existing model can be adapted to the question you’re trying to answer
    • Considering using transfer learning or model adaptation techniques
  2. Select how you’ll model it

    • Choose your training model architecture (i.e. CNN, RNN) i.e. how many huge matrices your training will run through
    • Choose your loss function algorithm i.e. it calculates how close your expected output is to what it predicted
  3. Split your information into three (for training, validating, and testing)

    • One for training the model, say 60%
    • One for validating the model’s predictions on each training run, say 20%
      • Tune your model’s parameters and selecting the best performing one
      • Why not train on this? This tells you how your model generalizes to information it hasn’t seen yet, avoiding memorizing your training information instead of learning generalizable patterns
    • One for testing your final model’s predictions
  4. Train your model (using the gradient descent technique) on your training information

  5. Measure how close it predicts what you expect using your validation information

    • Manually examine the errors on examples in your validation information, and try to spot a trend where most of the errors were made
    • Get a single, numerical value to how close your model gives what you expect with data it hasn’t seen
    • Use statistical techniques like: accuracy, precision, recall, F1 score, etc
  6. Tweak what you chose in 1 and 2 if its predictions are not good enough and retrain your model in step 5

    1. Reduce # of features (price, size, location, etc.)
    2. Adjust:
      • Learning rate: think of it as a tuning knob. A higher learning rate can be faster but also unstable, prone to being bad at predicting on information it hasn’t seen yet (i.e. overfitting). While a lower learning rate is more stable, it can be too simple to learn anything useful (i.e. underfitting)
      • Number of training runs (aka epochs): is the number of complete passes through the training information. Increasing runs allows your model to learn more complex patterns but can also lead to overfitting
      • Regularize your information: it helps prevent memorizing the training information and encourages learning patterns that generalize to new data
      • Loss function algorithm: start with a simple one, implement it quickly
    3. Go back to step 5
  7. You have your final model, measure how close it predicts what you expect using your testing information

    • Go back to step 6 if its predictions are not good enough
  8. Deploy model


Credits

  • My learning from: Coursera Machine Learning, Coursera Machine Learning Foundations for Product Managers, Situation Challenge Questions Answers (SCQA framework)
  • My editors fixing gaps in my understanding: ChatGPT, Gemini