Showing posts with label Statistics. Show all posts
Showing posts with label Statistics. Show all posts

Python Packages for Data Science

 In order to do data analysis in Python, you should know a little bit about the main packages relevant to analysis in Python. A Python library is a collection of functions and methods that allow you to perform lots of actions without writing any code. The libraries usually contain built-in modules providing different functionalities, which you can use directly. And there are extensive libraries, offering a broad range of facilities.Infographic...

Statistical Tools to Kick Start with Data Science

Some of the most common and convenient statistical tools like correlation, regression, box plot, line of best fit to get started with data science.HistogramHistograms are a statistical way of representing the frequencies of data values in particular intervals. The more traditional description is that a histogram is a chart plot of a frequency table where the height of the bars in it tell us how many data points are in each interval. Box...

Skewness and Kurtosis in Statistics

The average and measure of dispersion can describe the distribution but they are not sufficient to describe the nature of the distribution. For this purpose we use other concepts known as Skewness and Kurtosis. The symmetrical and skewed distributions are shown by curves as Skewness Skewness means lack of symmetry. A distribution is said to be symmetrical when the values are uniformly distributed around the mean. For example, the following distribution is...

Measures of Dispersion in Statistics

We know that averages are representatives of a frequency distribution but they fail to give a complete picture of the distribution. They do not tell anything about the scatterness of observations within the distribution. Suppose that we have the distribution of the yields (kg per plot) of two paddy varieties from 5 plots each. The distribution may be as follows: Variety I          45       ...

How to Choose Sample Size for a Simple Random Sample

Before we proceed to the concept, consider the following problem. You are conducting a survey. The sampling method is simple random sampling, without replacement. You want your survey to provide a specified level of precision. To choose the right sample size for a simple random sample, you need to define the following inputs. Specify the desired margin of error ME. This is your measure of precision. Specify alpha. For a hypothesis test, alpha...

Introduction to Normal Distribution in Statistics

A continuous random variable has an infinite number of values that can be represented by an interval on the number line.  It’s probability distribution  is called a continuous probability distribution.  In this article, we will be understanding the most important continuous probability distribution in statistics, the normal distribution.   A normal distribution is a continuous probability distribution for a random variable,...

Statistical p-values

When results of studies or research are reported, important decisions are made on the basis of these results. For example, new varieties are often tested against standard varieties to determine if the new varieties is more effective. Several methods of manufacturing may be compared to select the best technique to manufacture the best product. Several evidence may be examined to determine if there is a possible link between one activity and a result....

Cautions about Regression and Correlation

Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you must be aware of their limitations. ■ Correlation and regression lines describe only linear relationships. You can do the calculations for any relationship between two quantitative variables, but the results are useful only if the scatterplot shows a linear pattern. ■ Correlation and least-squares regression lines...

Statistical Terminology - Explained

We often make use of statistical techniques in the analysis of findings. Some are highly sophisticated and complex, but those most often used are easy to understand. The most common are measures of central tendency (ways of calculating averages) and correlation coefficients (measures of the degree to which one variable relates consistently to another). There are three methods of calculating averages, each of which has certain advantages and shortcomings....

Multicollinearity

The use and interpretation of a multiple regression model depends implicitly on the assumption that the explanatory variables are not strongly interrelated. In most regression applications the explanatory variables are not orthogonal. Usually the lack of orthogonality is not serious enough to affect the analysis. However, in some situations the explanatory variables are so strongly interrelated that the regression results are ambiguous. Typically, it is impossible to estimate the unique effects of individual variables in the regression equation....

Diagnostics and Remedial Measures

The interpretation of data based on analysis of variance (ANOVA) is valid only when the following assumptions are satisfied: 1. Additive Effects: Treatment effects and block (environmental) effects are additive. 2. Independence of errors: Experimental errors are independent. 3. Homogeneity of Variances: Errors have common variance. 4. Normal Distribution: Errors follow a normal distribution. Also the statistical tests t, F, z, etc. are valid under the assumption of independence of errors and normality of errors. The departures from these assumptions...