Use natural language to talk to your data lake

Chatting with your data enables huge benefits

GM, GM πŸ€™

It’s holiday season and I got some creative inspirations going and want to try something different today.

Maybe you now I have been an IT Analyst for some years and wrote about topics like Cloud, AI, Digital Strategy, etc.

So today I wrote an article how you can leverage your companies data like it would be with ChatGPT, but all in your own cloud environment. And why you definitely should do this.

I need your feedback:
πŸ‘ Do you like such articles?
πŸ‘Ž Should I go back to the regular program with the Use Cases
πŸ™Œ Should I mix it up?

πŸ‘‰ Hit reply and let me now with the emojis above!

And now to the article.

Use natural language to talk to your data lake

Chatting with your data enables huge benefits

Most industrial companies have a data lake these days. - Those who don't: hurry up πŸ˜‰.

But typically, only trained engineers can access the data and extract relevant information from it, like with a Jupyter notebook.

Wouldn't it be great if you could use plain English to talk to your datasets (like ChatGPT)?

Well, your dreams have been heard!

Felix Lopez from AWS showed how to do this in a video.

But first, let's talk about why you should build such a framework for your data lake.

Data engineers are critical to access information from a data lake - for now

Sven Matuschzik of the Brose Group and his team put Felix's advice to the test.

The German automotive supplier faces problems that any company with a large data lake faces:

  • Data stored in a data lake can only be explored using cloud and technical knowledge

  • Only a technical query language can be used to extract insights

  • Identifying data sources to generate on-the-fly answers is not possible at scale

But Felix promises that generative AI will make it easier to work with the data lake, including

  • Natural language capabilities that allow non-technical users to query data using conversational English rather than complex SQL

  • Asking fact-based questions without knowing the underlying data channels

Benefits of accessing your data lake trough natural language

Using natural language to talk to your data lake changes the game for your entire organization.

Now everyone can use the data and leverage that information. Gone are the days of having to bring in a data engineer to answer specific questions about your domain.

  • How many customers from Canada placed an order last month?

  • What is the distribution of the individual product variants over the year?

  • Could deliveries to the same region have been consolidated to save on transportation expenses?

  • Which supplier had the highest error rate last month?

  • Which product had the most frequent service cases?

  • What was the pressure for machine X on a certain date and time?

These questions are just the tip of the iceberg.

Any employee can type them into a chat window to get the answers directly. Any non-developer can now access the data, understand it, interpret it, and use it in their daily work.

And that's exactly why we collect the data, so that we can make sense of it in our day-to-day work.

Your Data Lake and Generative AI

Needless to say, this assumes that the data lake is connected to a Large Language Model (LLM). In this case, it's SageMaker.

Of course, the other hyperscalers like Azure and Google have their own models and applications, but in this example Felix uses the AWS Techstack (of course πŸ™ƒ).

To get an overview of the architecture, he prepared the following slide.

If you want to dive deeper, be sure to watch the short video where he walks you through all the steps with example code. So forward it to your lead developers and architects.

The use cases are infinite

It doesn't have to stop there.

There are many more options you can incorporate.

Or you can build a chat bot or virtual assistant for each department in your company.

For some of you, a conversational bot (like Alexa) helps in different environments, no problem, you can use it too.

This is just an example to give you an idea of what is possible.

But as always when playing with your company's sensitive data, check your LLM and terms of service and maybe restrict certain data to certain user groups.

People in production may not need access to marketing data and vice versa.

Now I want to know, where do you see a ChatGPT-like application in your organization?

And remember, please give me feedback. πŸ˜ƒ