Training data
In more depth
For large language models, training data typically includes web pages, books, code, and licensed datasets gathered up to a cutoff date, after which the model knows nothing unless given new information. Errors, gaps, and biases in the data surface as errors and biases in the model. The provenance of training data is the subject of ongoing copyright litigation against AI developers, and whether user inputs become training data is a central confidentiality question when evaluating any AI tool.
Further reading: Wikipedia.
Related terms
Educational information, not legal advice. AI terminology and tools change quickly; definitions reflect usage as of the last-updated date. For what bar associations and courts actually require of lawyers using AI, see legalaicompliance.help and consult a licensed attorney in your jurisdiction.