From Providing Data to Explaining Why: How Large Language Models are Changing the Role of Data Practitioners

Unlocking the power of Large Language Models in data practice: From SQL troubleshooting to natural language querying with LlamaIndex.

Andrew Crider
7 min readApr 3, 2023
Image available to author via unique access to Midjourney the author assumes responsibility for the authenticity.

This article was originally published at StreamSets.com.

Over the past few months, there has been a lot of talk about how ChatGPT, and other Large Language Models (LLMs), will change the world. As data professionals, the idea that a machine can provide answers to questions, including business questions, might seem to be an existential threat. But is it?

Large Language Models can be outstanding in providing answers to what is (or what the LLM thinks is true); but by the nature of their training and construction, they are not very good at explaining why something is. LLMs don’t “know” anything about context. They’re really just guessing based on their training.

What are Data Engineers Really Doing?

According to this Monte Carlo survey, data engineers spend 40% of their workday on Bad Data. Not only that but:

  • An average organization has 61 data incidents a month, taking an average of 13 hours to…

--

--