Python Packages for Data Science

 In order to do data analysis in Python, you should know a little bit about the main packages relevant to analysis in Python. A Python library is a collection of functions and methods that allow you to perform lots of actions without writing any code. The libraries usually contain built-in modules providing different functionalities, which you can use directly. And there are extensive libraries, offering a broad range of facilities.Infographic...

Statistical Tools to Kick Start with Data Science

Some of the most common and convenient statistical tools like correlation, regression, box plot, line of best fit to get started with data science.HistogramHistograms are a statistical way of representing the frequencies of data values in particular intervals. The more traditional description is that a histogram is a chart plot of a frequency table where the height of the bars in it tell us how many data points are in each interval. Box...

Adobe Analytics Connector - Microsoft Power BI Desktop

In the month of December 2017 Microsoft Power BI is releasing a new connector for Adobe Analytics i.e., Omniture. This new connector will allow to import and analyze your Adobe Analytics data within Power BI. Being a Beta version you have to first enable the connector from Options and Settings >> Options >> Preview Features to view the Adobe connector under Online services. This Adobe Analytics connector can be found in the...

Skewness and Kurtosis in Statistics

The average and measure of dispersion can describe the distribution but they are not sufficient to describe the nature of the distribution. For this purpose we use other concepts known as Skewness and Kurtosis. The symmetrical and skewed distributions are shown by curves as Skewness Skewness means lack of symmetry. A distribution is said to be symmetrical when the values are uniformly distributed around the mean. For example, the following distribution is...

Measures of Dispersion in Statistics

We know that averages are representatives of a frequency distribution but they fail to give a complete picture of the distribution. They do not tell anything about the scatterness of observations within the distribution. Suppose that we have the distribution of the yields (kg per plot) of two paddy varieties from 5 plots each. The distribution may be as follows: Variety I          45       ...

How to Choose Sample Size for a Simple Random Sample

Before we proceed to the concept, consider the following problem. You are conducting a survey. The sampling method is simple random sampling, without replacement. You want your survey to provide a specified level of precision. To choose the right sample size for a simple random sample, you need to define the following inputs. Specify the desired margin of error ME. This is your measure of precision. Specify alpha. For a hypothesis test, alpha...

Introduction to Normal Distribution in Statistics

A continuous random variable has an infinite number of values that can be represented by an interval on the number line.  It’s probability distribution  is called a continuous probability distribution.  In this article, we will be understanding the most important continuous probability distribution in statistics, the normal distribution.   A normal distribution is a continuous probability distribution for a random variable,...

Storytelling with Data - Web Analysis

Why Storytelling? Visual analysis means exploring data visually. A story unfolds as you navigate from one visual summary into another. You and your team have sorted through and analyzed a dense data set, made industry-relevant discoveries and created data visualizations that allow you to share those insights with others—whether other team members, current or potential clients, or the community at large. Before you present your work, think about...

Statistical p-values

When results of studies or research are reported, important decisions are made on the basis of these results. For example, new varieties are often tested against standard varieties to determine if the new varieties is more effective. Several methods of manufacturing may be compared to select the best technique to manufacture the best product. Several evidence may be examined to determine if there is a possible link between one activity and a result....

Cautions about Regression and Correlation

Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you must be aware of their limitations. ■ Correlation and regression lines describe only linear relationships. You can do the calculations for any relationship between two quantitative variables, but the results are useful only if the scatterplot shows a linear pattern. ■ Correlation and least-squares regression lines...