How to read CSV file in pandas: An advance guide

CSV (Comma Separated Values)  files are one of the most used files for storing data, and of course, for analyzing these data, and there are multiple ways you can read CSV files in Python. Python’s Pandas is out there to simplify this journey.

Also read: Getting started with NumPy

Let’s get started with this helpful guide.

The simplest way to make CSV to the data frame using pandas is;

import pandas
df = pandas.read_csv('data.csv')
print(df.to_string())
  • You can see i have put (.to_string) an extension to the statement print(). This is because it will format all the data in the CSV file, else it will only show you some top and bottom rows.

You might be looking for a guide to kick-start your data analysis task using pandas in Python. That’s what exactly we’re about to dig into.

Pandas read_csv to dataframe

Pandas read_csv is a function to transform a CSV file into Dataframe.

  • Put this command, Pandas read_csv(filename.csv). Ensure that you import the right file with the exact extension as.CSV.
  • Else way you will catch this error: `FileNotFoundError: File b’filename.CSV’ does not exist

Example:

import pandas
df = pandas.read_csv('data.csv')
print(df)

In the above example, df is representing the data frame, and to explore it, use the print(df). You may add different extensions to play and customize your data frame in different ways.

DataFrame to CSV

  • You can store a Pandas data frame to a CSV file by df.to_csv where df is representing our data frame variable.

Example:

Here is the example code you can save a data frame to a CSV using the Pandas’ function. We have built an example data frame, then store it in our storage.

import pandas
cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
        'Price': [22000,25000,27000,35000]
        }
df = pandas.DataFrame(cars, columns= ['Brand', 'Price'])
df.to_csv (r'C:\Users\Ron\Desktop\export_dataframe.csv', header=True)
print (df)

Keep in mind here. Your stored file will contain indexes as Pandas add them to default. Let’s see our next example which will explore how to store a file without an index.

Pandas to CSV without index

  • In case you are looking for saving a data frame with no indexes, you just need to put this argument index=False while passing pandas_toCSV

Example:

Here is the full code that you’ll need to pass.

df.to_csv(r'Path where you want to store the exported CSV file\File Name.csv', index = False)

Pandas Read CSV header

  • In pandas, once you imported a CSV file to Dataframe. You can get headers by this head() extension. You will need to put it such as print(df.head(Number of results)).

And it is a great way to analyze big data because it allows you to limit the number of results you want.

Example:

import pandas
df = pandas.read_csv('data.csv')
print(df.head())

Read CSV file without header

  • By putting header value to none in your Dataframe value, you can remove the header from a CSV file. For example: after importing a CSV file, put an extension to it similar to this df = pd.read_csv('data.csv', header=None). Then Print (df). Headers will be no more.

Example:

import pandas
df = pandas.read_csv('data.csv', header=None)
print(df)

read CSV column to list

  • You can read specific columns for a CSV file by adding print(df.Columnname). It will show the results that a specific column contains. It is useful to analyze data that has many columns.
  • Below is the command that you’ll need to execute, lock this up that you have converted Pandas CSV to Dataframe, hence below df (variable) is representing the Dataframe.

Example:

You can see in the below example code, while printing, we have attached the column name we want only to list.

import pandas
df = pandas.read_csv('data.csv')
print(df.Pulse) 

Pandas read_CSV skip rows.

  • Skipping rows in Pandas can be done by adding this (skiprow=N) to the variable. It is a good practice to clear and manage a lengthy datasheet. Simply, skiprows=N can wrap up data by folding up a number of rows.

Example:

In skiprow=N, the N is referring to the number of rows you want to skip.

import pandas
df = pandas.read_csv('data.csv', skiprows=2)
print(df)

Pandas read csv delimiter

  • First, put delimiter=None. You can use ,sep='custom sep'  , engine=’python’ next to the arguments read_csv. Where sep='' function is by default filled with “,” separator. If you have a file sheet with different separators, you can put them in the above value.

Shortly, you can view Data with different values into Dataframe by using the above argument.

Example:

The sep= ‘’ the function has several values. Let’s have an example code.

import pandas
df = pandas.read_csv ( 'data.csv', sep='__ ', delimiter=None,  , engine='python')
print(df)

We have passed,sep=’__’ assuming our data has __ separators.

  • Suppose a Datavalues separated by while spaces. Then, pass this value. sep='\s+'
  • Similarly, if the data has, let’s say random values- Then, put them all in this argument like this sep=[:,|__].

read CSV string

  • In order to use the Data Frame functionality from a Python string, you will have to use io.StringIO (python3) then execute that to the Pandas.read_csv function. Then, it will establish data frame functions to string similar to Pandas read CSV.

Example:

  • Let’s take a look at this below example, where we’ll show you here how to transform data into Pandas Dataframe using string.
import sys
import pandas
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO
pandasdata= StringIO("""col1;col2;col3
    1;4.4;99
    2;4.5;200
    3;4.7;65
    4;3.2;140
    """)
df = pandas.read_csv(pandasdata, sep=";")
print(df)

An interesting thing to note down here, we used a custom sep=”;”. As our given data separated by the.

Pandas read_csv index

  • If you want to pass an index of the column, you can use index_col. Now the column index that you will pass is used as a row label of the data frame. The default value is None, you can pass False, int, or name of the column as a string. Let’s see an example;
import pandas
df_with_column_index = pandas.read_csv(pandasdata, index_col=1) #passing int value of the column index
df_with_column_name = pandas.read_csv(pandasdata, index_col='column name') #passing int value of the column index
print(df)

read CSV without index

  • As mentioned above, you can use index_col and passes its value to False to let Pandas know that you don’t want to use the first column as the index. Now if you don’t use, the default value Pandas use None. Here is an example
import pandas
df_without_first_column_as_index = pandas.read_csv(pandasdata, index_col=False) #passing False forces pandas to not use first column as index
print(df)

About Moiz Rajput

Moiz here, he is a Blogger, a Creative Content Writer, and an SEO intern at the same time. He is passionate about what he does, energetic, incentive, with a positive attitude towards sociology.