Reading in Data With Pandas

1.5. Reading in Data With Pandas#

One of the most common formats in which data is stored in is a comma separated value (csv) file. One of the ways we can extract data from a csv file into an array is to use the pandas library. To import this library we use:

import pandas as pd

Any time we want to use a function from this library we use:

pd.function

To read in a csv file, we will use the pandas function read_csv(). It will look like this:

pd.read_csv(name_of_file)

This will read our csv file into a pandas DataFrame. You can think of a DataFrame as a table of data with rows and columns. Here is an example where we have read in a file called study.csv, which is stored in the folder course.

import pandas as pd

data = pd.read_csv("study.csv")
print(data)

Here is a copy of study.csv.

study.csv

To extract out a single column we can use the following:

DataFrame[column_name]

You’ll have noticed that there are two columns in study.csv, these are:

Time Spent Studying (hours)
Exam Mark (%)

We can extract these out into x our independent variable and y our dependent variable. x and y are pandas series objects. These are like 1-dimensional arrays.

import pandas as pd

data = pd.read_csv("study.csv")

x = data["Time Spent Studying (hours)"]
y = data["Exam Mark (%)"]

print(x)
print(y)

To convert these to numpy arrays we use:

series.to_numpy()

import pandas as pd

data = pd.read_csv("study.csv")

x = data["Time Spent Studying (hours)"].to_numpy()
y = data["Exam Mark (%)"].to_numpy()

print(x)
print(y)

We use numpy arrays because it’s easier to manipulate their dimensions using

.reshape(rows, columns)

This allows you to quickly change your 1D vector into a 2D column or row vector. You’ll see that this will be useful later.

import numpy as np

array = np.array([1, 2, 3, 4, 5])  # 1D array
print(array.reshape(5, 1))  # 2D array with 5 rows and 1 column

You would have seen this in Multi-Dimensional Arrays in Year 11 > Python Fundamentals > Data structures.