The Data can be collected by the analyst himself or there may be some team who would have collected the data already. It can be either available in the form of an excel file or it can be present in the database in the form of a Table. To simplify your day-to-day work, It is recommended to learn at least a few ways by which you can read the collected data and can analyze it for decision making.
Your data can exist in form of:
- CSV (Comma Separated File)
- TSV (Tab Separated File)
- Xlsx (Excel File)
2) Or a Database Table
3) Or any other source
Before moving forward, understand that, If the data is small, you can simply use Microsoft Excel or Google Sheets, or Ubuntu Libre Software to explore it. But if the size of data is huge, it becomes very hard to analyze it in Excel Tools, in this case, you need to use some Language as well as some IDE (Integrated Development Environment) to import the data and to analyze it further.
Let us understand one by one
1) Reading Data From File:
It has 891 passenger's information and 12 properties about themselves. Since it is a very small data set, analysis can be done in the Excel file itself.
But suppose in case you have a file having 10,000 rows and 100 columns, it is very difficult to analyze the data in an Excel file, therefore we need some interface through which we can read this file.
There are multiple IDEs available which allows you to do the same and also there are multiple languages by which you can program things.
Example IDEs to work with:
Example Languages to work with:
For this tutorial, we are using Jupyter Notebook IDE and Python Language to deal with datasets.
Note: if you don’t have Python and Jupyter Notebook installed in your system, please refer to article "Jupyter NoteBook Setup" in "Data Handling" Course and finish the installation part.
1 - a) Reading data from CSV File:
Files which have extension as “.csv” are called as comma-separated files where stored data is separated by comma ‘,‘
Python Code to read .csv file:
Reading data in Jupyter Notebook is very easy. Firstly, import the library, secondly read the csv by providing the file name and separator type (in case of csv, the separator is ‘, ‘)
1 - b) Reading data from TSV File:
A TSV file contains data which is separated by TAB ( ‘\t’ ) - 4 spaces
Python Code to read .tsv file:
import pandas as pd
data = pd.read_csv("titanic_data.tsv", sep='\t')
1 - c) Reading data from Excel File:
Python Code to read .xlsx file:
import pandas as pd
data = pd.read_excel("titanic.xlsx", engine='openpyxl')
Once you have imported data in your IDE, you are ready to do any type of analysis that you are interested in. Python and R provide a vast range of libraries which simplifies our analysis.