Use ChatGPT to do Data Science for you

Jeffrey

Jan 3, 2023

6 min read

ChatGPT can do a lot of things, I mean a lot of things. You have seen it write stories, jokes, full on blog posts, and even build a website.

But today, I wanna give you a no-bullshit demo on how to use this tool to do some data science and basic statistics.

I’ll be working with the wine quality data set from UCI’s Machine Learning Repository.

ChatGPT has some limitations in how much data it can process, so we will just work with the first 40 rows.

This is a bit of an advanced note. If you haven't read the other two chatGPT notes yet or haven't played around with chatGPT before, check them out first:

Alright, let's get started.

ChatGPT can read and analyze raw csv data.

This was the first thing that blew my mind. I copied and pasted the data set without changing anything. The AI not only read it, but it understood that it was data on wine quality! 🤯

ChatGPT acted like that wasn’t a big deal and proceeded to tell me what else it could do.

👀

Checkpoint

Find a dataset to play with and feed it to chatGPT. Share your journey!

ChatGPT can do basic descriptive statistics and provide the python code for you.

I continued on and asked ChatGPT if it could give me the summary on the quality of the wine.

I also asked it to provide me the python code so I could check it myself to verify.

Check out the output — it’s even given it into a format that many data sci people are used to.

Getting basic questions about the relationship between different variables answered.

Before I dropped out of graduate school for statistics, I often consulted non-technical researchers in the social sciences. It was always a pain for them to run datasets by themselves just to get some answers to their questions.

In our example, you might be interested in a simple question: “what’s the relationship between pH levels of wine and the quality ratings?”

This was exactly what I asked ChatGPT, and it was able to give me an answer that was pretty spot on!

ChatGPT was able to think about looking at the scatter plot for answers! Before I checked the results myself, I asked the AI to give me the actual correlation between the two variables. I also asked it to generate the scatter plot for me.

ChatGPT hit a limitation. It wasn’t able to plot, but it still came through with providing the python code.

Finally, it gave me the python code to calculate the correlation between the two variables.

I doubled checked the results along with the python code it provided — we’re in good hands it seems.

Running basic machine learning libraries like scikit-learn within chatGPT

It’s not a surprise that this AI can run AI tools too 😂.

So, I decided to ask it to run the most basic analysis of all — linear regression.

Can you use scikit-learn to run a linear regression analysis on "citric acid" and "volatile acidity"

Check out the results.

By this point, your mind should be churning with ideas about all of the potentials this tool could provide. This is especially helpful for someone like me. I am more interested in the analysis rather than the coding. It’s easy to see how ChatGPT can be integrated quickly to handle the grind work.

Let’s end this note with another question for ChatGPT:

What machine learning can you do with the variables in this data set?

Yeah…this tool is here to stay fo sho. 🤖

Now it’s up to you — go explore ChatGPT and find some crazy patterns in the data all around you.
Good luck!

Till next time,
Jeffrey