Inference

How models work Last reviewed 2026-07-01

Definition The stage at which a trained AI model is put to work—taking an input and producing output. Every chatbot answer is an act of inference; training, by contrast, is when the model learned.

In more depth

During inference the model's weights stay fixed; the system runs the input through the network to compute a response, one token at a time. For most commercial tools inference happens on the provider's servers, which is why confidentiality review focuses on where prompts and documents are sent, processed, and stored. Inference costs scale with usage, which is why providers meter and bill by token.

Related terms

About the editor: MHSB Solutions, Research desk. MHSB Solutions is not a law firm. This glossary is educational information, not legal advice.

Educational information, not legal advice. AI terminology and tools change quickly; definitions reflect usage as of the last-updated date. For what bar associations and courts actually require of lawyers using AI, see legalaicompliance.help and consult a licensed attorney in your jurisdiction.