Wu Tang AI · words _by jm

I think there’s an old Wu Tang jam about AI models: “D.R.E.A.M.” That is, Data Rules Everything Around Models.

As practitioners have started implementing LLMs in their tech stacks, I’ve seen a lot of chatter and confusion about what the limitations of these models can be. When you fiddle with ChatGPT for a while, it can seem like some sort of black magic, so when you start bumping up against some of the limitations of LLMs, it can be confusing. Why can it understand my questions about Miley Cyrus’s dining preferences as expressed in her memoir “Miles To Go” but can’t seem to parse this seemingly simple question about traffic laws in Milwaukee? Why is it telling me over and over about the growth patterns of eggplants when I asked about the inhumane egg farming plants that I saw in a documentary?

The thing is, artificial intelligence isn’t “intelligent” as we colloquially describe it. We tend to use “IQ” to measure someone’s perceived intelligence, and AI models currently are not very good at the things we use to determine that. Someone actually did a study (https://lnkd.in/eRszr5Ft), and the results weren’t great (spoiler: some models barely reached “average,” most were slightly below average). AI models currently can not establish new ground truths or innovate newly reasoned facts. They can combine and remix their existing knowledge in new ways, but their training and fine tuning data determine everything that they know or can possibly know without augmentation. Think of them more like really good remix artists rather than lyrical virtuosos.

This fact also goes for relationships between data. You can supply a giant corpus of data, but if you’re asking questions about correlation between data and haven’t supplied any training data to enforce the relationship (i.e., you don’t have to draw a line between A and B as facts but instead draw a relationship between the types of data), the AI will be very unlikely to make that connection for you. At the least, it will produce very poor understandings of it.

Side note: that’s why having a base model that’s trained on a corpus of data that fits your needs is important. Yes, you can fine tune, but unless you need really general capabilities, it’s probably better to find/build/train a more specialized model. For example: sure, the big LLMs handle code pretty well, but why not deploy something that’s especially geared towards understanding code to cut down on the amount of fine tuning and augmentation necessary to get the results you’re seeking?