PadhaiTime Logo
Padhai Time

Reading Collected Data from a file

The Data can be collected by the analyst himself or there may be some team who would have collected the data already. It can be either available in the form of an excel file or it can be present in the database in the form of a Table. To simplify your day-to-day work, It is recommended to learn at least a few ways by which you can read the collected data and can analyze it for decision making.

Your data can exist in form of:

1) File

- CSV (Comma Separated File)

- TSV (Tab Separated File)

- Xlsx (Excel File)

2) Or a Database Table

3) Or any other source

Before moving forward, understand that, If the data is small, you can simply use Microsoft Excel or Google Sheets, or Ubuntu Libre Software to explore it. But if the size of data is huge, it becomes very hard to analyze it in Excel Tools, in this case, you need to use some Language as well as some IDE (Integrated Development Environment) to import the data and to analyze it further.

Let us understand one by one

1) Reading Data From File:

Consider we have Titanic Dataset in form of an excel file (Refer: https://www.kaggle.com/c/titanic/data for more details about dataset)

undefined

It has 891 passenger's information and 12 properties about themselves. Since it is a very small data set, analysis can be done in the Excel file itself.

But suppose in case you have a file having 10,000 rows and 100 columns, it is very difficult to analyze the data in an Excel file, therefore we need some interface through which we can read this file.

There are multiple IDEs available which allows you to do the same and also there are multiple languages by which you can program things.

Example IDEs to work with:

 

  • Jupyter Notebook
  • R Studio
  • SAS Tool
  • Spyder
  • PyCharm
  • Microsoft Visual Studio

Example Languages to work with:

  • Python
  • R
  • Java Script

For this tutorial, we are using Jupyter Notebook IDE and Python Language to deal with datasets.

Note: if you don’t have Python and Jupyter Notebook installed in your system, please refer to article "Jupyter NoteBook Setup" in "Data Handling" Course and finish the installation part.

1 - a) Reading data from CSV File:

Files which have extension as “.csv” are called as comma-separated files where stored data is separated by comma ‘,‘

undefined

Python Code to read .csv file:

undefined


Reading data in Jupyter Notebook is very easy. Firstly, import the library, secondly read the csv by providing the file name and separator type (in case of csv, the separator is ‘, ‘)

1 - b) Reading data from TSV File:

A TSV file contains data which is separated by TAB ( ‘\t’ ) - 4 spaces

undefined

Python Code to read .tsv file:

import pandas as pd

data = pd.read_csv("titanic_data.tsv", sep='\t')

data.head()

1 - c) Reading data from Excel File:

Python Code to read .xlsx file:

import pandas as pd

data = pd.read_excel("titanic.xlsx", engine='openpyxl')

data.head()

Once you have imported data in your IDE, you are ready to do any type of analysis that you are interested in. Python and R provide a vast range of libraries which simplifies our analysis.

Bengaluru, India
contact.padhaitime@gmail.com
  • We collect cookies and may share with 3rd party vendors for analytics, advertising and to enhance your experience. You can read more about our cookie policy by clicking on the 'Learn More' Button. By Clicking 'Accept', you agree to use our cookie technology.
    Our Privacy policy can be found by clicking here