In order to do data analysis in Python, you should know a little bit about the main packages relevant to analysis in Python. A Python library is a collection of functions and methods that allow you to perform lots of actions without writing any code. The libraries usually contain built-in modules providing different functionalities, which you can use directly. And there are extensive libraries, offering a broad range of facilities.
Infographic vector created by rawpixel dot com on freepik dot com
Scientific Computing Libraries
i. Pandas offers data structure and tools for effective data manipulation and analysis.
It provides fast axis to structured data. The primary instrument of Pandas is a two-dimensional
table consisting of column and row labels, which are called a DataFrame. It is designed to
provide easy indexing functionality.
ii. The Numpy library uses arrays for its inputs and outputs. It can be extended to objects for
matrices, and with minor coding changes, developers can perform fast array processing.
iii. SciPy includes functions for some advanced math problems, as well as data visualization.
Using data visualization methods is the best way to communicate with others, showing them meaningful
results of analysis.
Libraries to create graphs, charts and maps
i. The Matplotlib package is the most well-known library for data visualization. It is great
for making graphs and plots. The graphs are also highly customizable.
ii. Seaborn: It is based on Matplotlib. It's very easy to generate various plots such as heat maps, time series, and violin plots.
Machine Learning algorithms:
i. The Scikit-learn library contains tools for statistical modeling, including regression,
classification, clustering and so on. This library is built on NumPy, SciPy and Matplotlib.
ii. StatsModels is also a Python module that allows users to explore data, estimate statistical models, and perform statistical tests.