How to read CSV file in pandas: An advance guide

CSV (Comma Separated Values)  files are one of the most used files for storing data, and of course, for analyzing these data, and there are multiple ways you can read CSV files in Python. Python’s Pandas is out there to simplify this journey.

Also read: Getting started with NumPy

Let’s get started with this helpful guide.

The simplest way to make CSV to the data frame using pandas is;

import pandas
df = pandas.read_csv('data.csv')
print(df.to_string())

You can see we have put (.to_string) an extension to the statement print(). This is because it will format all the data in the CSV file, else it will only show you some top and bottom rows.

You might be looking for a guide to kick-start your data analysis task using pandas in Python. That’s what exactly we’re about to dig into.

Pandas Read_CSV to dataframe

Pandas read_csv is a function to transform a CSV file into Dataframe. Put this command, Pandas read_csv(filename.csv). Ensure that you import the right file with the exact extension as.CSV. Else way you will catch this error: `FileNotFoundError: File b’filename.CSV’ does not exist

Example:

import pandas
df = pandas.read_csv('data.csv')
print(df)

In the above example, df is representing the data frame, and to explore it use the `print(df). You may add different extensions to play and customize your data frame in different ways.

DataFrame to CSV

You can store a Pandas data frame to a CSV file by df.to_csv where df is representing our data frame variable.

Example:

Here is the example code you can save a data frame to a CSV using the Pandas’ function. We have built an example data frame, then store it in our storage.

import pandas

cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
        'Price': [22000,25000,27000,35000]
        }

df = pandas.DataFrame(cars, columns= ['Brand', 'Price'])
df.to_csv (r'C:\Users\Ron\Desktop\export_dataframe.csv', header=True)
print (df)

Keep in mind here. Your stored file will contain indexes as Pandas add them to default. Let’s see our next example that will explore how to store a file without an index.

Pandas to CSV without index

In case if you are looking for saving a data frame with no indexes, you just need to put this argument index=False while passing pandas_toCSV

Example:

Here is the full code that you’ll need to pass.

`df.to_csv(r'Path where you want to store the exported CSV file\File Name.csv', index = False)

Pandas Read CSV header

In pandas, once you imported a CSV file to Dataframe. You can get headers by this head() extension. You will need to put it such as print(df.head(Number of results)). And it is a great way to analyze big data because it allows you to limit the number of results you want.

Example:

import pandas

df = pandas.read_csv('data.csv')
print(df.head())

Read CSV file without header

By putting header value to none in your Dataframe value, you can remove the header from a CSV file. For example: after importing a CSV file, put an extension to it similar to this df = pd.read_csv('data.csv', header=None). Then Print (df). Headers will be no more.

Example:

import pandas

df = pandas.read_csv('data.csv', header=None)
print(df)

read CSV column to list

You can read specific columns for a CSV file by adding print(df.Columnname). It will show the results that a specific column contains. It is useful to analyze data that has many columns.

Below is the command that you’ll need to execute, lock this up that you have converted Pandas CSV to Dataframe, hence below df (variable) is representing the Dataframe.

Example:

You can see in the below example code, while printing, we have attached the column name we want only to list.

import pandas
df = pandas.read_csv('data.csv')
print(df.Pulse) 

Pandas read_CSV skip rows.

Skipping rows in Pandas can be done by adding this (skiprow=N) to the variable. It is a good practice to clear and manage a lengthy datasheet. Simply, skiprows=N can wrap up data by folding up a number of rows.

Example:

In skiprow=N, the N is referring to the number of rows you want to skip.

import pandas
df = pandas.read_csv('data.csv', skiprows=2)
print(df)

Pandas read csv delimiter

First, put delimiter=None. You can use ,sep='custom sep'  , engine=’python’ next to the arguments read_csv. Where sep=’’ function is by default filled with “,” separator. If you have a file sheet with different separators, you can put them in the above value.

Shortly, you can view Data with different values into Dataframe by using the above argument.

Example:

The sep= ‘’ function has several values. Let’s have an example code.

import pandas
df = pandas.read_csv ( 'data.csv', sep='__ ', delimiter=None,  , engine='python')
print(df)

We have passed,sep=’__’ assuming our data has __ separators.

Suppose a Datavalues separated by while spaces. Then, pass this value. sep='\s+'

Similarly, if the data has, let’s say random values- Then, put them all in this argument like this sep=[:,|__].

read CSV string

In order to use the Data Frame functionality from a Python string, you will have to use io.StringIO (python3) then execute that to the Pandas.read_csv function. Then, it will establish data frame functions to string similar to Pandas read csv.

Example:

Let’s take a look at this below example, where we’ll show you here how to transform data into Pandas Dataframe using string.

import sys
import pandas
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO
pandasdata= StringIO("""col1;col2;col3
    1;4.4;99
    2;4.5;200
    3;4.7;65
    4;3.2;140
    """)
df = pandas.read_csv(pandasdata, sep=";")
print(df)

An interesting thing to note down here, we used a custom sep=”;”. As our given data separated by the.

Also read: Machine learning ebooks and tools for beginners

Pandas read_csv index

If you want to pass index of the coumnl you can use index_col. Now the column index that you will pass used as a row label of data frame. The default value is None, you can pass False, int or name of the column as a string. Lets see an example;

import pandas
df_with_column_index = pandas.read_csv(pandasdata, index_col=1) #passing int value of the column index
df_with_column_name = pandas.read_csv(pandasdata, index_col='column name') #passing int value of the column index
print(df)

read CSV without index

As mentioned above, you can use index_col and passes its value to False to let Pandas know that you don’t want to use first column as the index. Now if you dont use index_col, the default value Pandas use None. Here is an example

import pandas
df_without_first_column_as_index = pandas.read_csv(pandasdata, index_col=False) #passing False forces pandas to not use first column as index
print(df)

Also read: Data science course for Python programmers

Some tips-

Many people face difficulties while importing a CSV to Pandas. Most of the time, it happens due to the file extension.

Always double-check you’re calling the file with. CSV extension.

There are a few extensions you should look at them, each of them has its own unique purpose.

.text- This extension usually contains text documents inside it. And there a lot of people mistakenly try to import this type of file in place of CSV. Kindly note that .text and .CSV files are entirely different from each other.

. IMG– This file extension usually contains an image. Obviously, you have to avoid this type of file too.

.CSV– Files with these extensions are favorable. They are the only ones that can be imported to Pandas by using the above commands. Significantly, we have given an overview regarding this CSV file extension.

In case you didn’t have enough understanding to differentiate a CSV file, then you can get this as a comma-separated value. Usually, they contain data in tabular form, thus importing them to Pandas. They will turn out into a table, making it easier to work with.

Scroll to Top