Training is how a model learns from data. It repeatedly predicts the next token, compares to the correct answer (loss), and updates its internal numbers (parameters) so predictions get better.
Training loop (simplified)
Data
Text (or pairs) the model learns from
Tokenize
Turn text into token IDs
Forward
Model predicts next token
Loss
Compare prediction to correct answer
Backward
Compute gradients
Update
Adjust billions of parameters
Repeat over huge datasets for many steps. "Parameters" are the numbers being updated; more parameters = more capacity to memorize patterns.
In practice
Training needs huge datasets, lots of compute (GPUs), and many steps. You donβt usually train from scratch; you fine-tune an existing model on your data.