# Exploring and Visualizing Data¶

Shan-Hung Wu & DataLab
Fall 2022

## Exploratory Data Analysis¶

Exploratory Data Analysis (EDA) is an important and recommended first step of Machine Learning (prior to the training of a machine learning model that are more commonly seen in research papers). EDA performs the exploration and exploitation steps iteratively. In the exploration step, you "explore" the data, usually by visualizing them in different ways, to discover some characteristics of data. Then, in the exploitation step, you use the identified characteristics to figure out the next things to explore. You then repeat the above two steps until you are satisfied with what you have learned from the data. Data visualization plays an important role in EDA. Next, we use the Wine dataset from the UCI machine learning repository as an example dataset and show some common and useful plots.

## Visualizing the Important Characteristics of a Dataset¶

NOTE: pd.read_csv() function returns a pandas.DataFrame object. Pandas Dataframe is an useful "two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes".