Sometimes, the outliers on a scatterplot and the reasons for them matter significantly. For example, an outlying data point may represent the input from your most critical supplier or your highest selling product.
The nature of a regression line, however, tempts you to ignore these outliers. The trick is to determine the right size for a sample to be accurate. Using proportion and standard deviation methods, you are able to accurately determine the right sample size you need to make your data collection statistically significant. When studying a new, untested variable in a population, your proportion equations might need to rely on certain assumptions. However, these assumptions might be completely inaccurate. This error is then passed along to your sample size determination and then onto the rest of your statistical data analysis.
Also commonly called t testing, hypothesis testing assesses if a certain premise is actually true for your data set or population. Hypothesis tests are used in everything from science and research to business and economic. To be rigorous, hypothesis tests need to watch out for common errors.
For example, the placebo effect occurs when participants falsely expect a certain result and then perceive or actually attain that result. Another common error is the Hawthorne effect or observer effect , which happens when participants skew results because they know they are being studied. However, avoiding the common pitfalls associated with each method is just as important.
Contact Join our Team. Standard Deviation The standard deviation, often represented with the Greek letter sigma, is the measure of a spread of data around the mean. Regression Regression models the relationships between dependent and explanatory variables, which are usually charted on a scatterplot. These two estimates of variances are compared using the F-test. However, a repeated measure ANOVA is used when all variables of a sample are measured under different conditions or at different points in time.
As the variables are measured from a sample at different points of time, the measurement of the dependent variable is repeated.
Using a standard ANOVA in this case is not appropriate because it fails to model the correlation between the repeated measures: When the assumptions of normality are not met, and the sample means are not normally, distributed parametric tests can lead to erroneous results. Non-parametric tests distribution-free test are used in such situation as they do not require the normality assumption.
That is, they usually have less power. As is done for the parametric tests, the test statistic is compared with known values for the sampling distribution of that statistic and the null hypothesis is accepted or rejected. The types of non-parametric analysis techniques and the corresponding parametric analysis techniques are delineated in Table 5. Median test for one sample: The sign test and Wilcoxon's signed rank test. The sign test and Wilcoxon's signed rank test are used for median tests of one sample.
These tests examine whether one instance of sample data is greater or smaller than the median reference value.
Therefore, it is useful when it is difficult to measure the values. Wilcoxon's rank sum test ranks all data points in order, calculates the rank sum of each sample and compares the difference in the rank sums. It is used to test the null hypothesis that two samples have the same median or, alternatively, whether observations in one sample tend to be larger than observations in the other. Mann—Whitney test compares all data xi belonging to the X group and all data yi belonging to the Y group and calculates the probability of xi being greater than yi: The two-sample Kolmogorov-Smirnov KS test was designed as a generic method to test whether two random samples are drawn from the same distribution.
The null hypothesis of the KS test is that both distributions are identical. The statistic of the KS test is a distance between the two empirical distributions, computed as the maximum absolute difference between their cumulative curves. The Kruskal—Wallis test is a non-parametric test to analyse the variance. The data values are ranked in an increasing order, and the rank sums calculated followed by calculation of the test statistic. In contrast to Kruskal—Wallis test, in Jonckheere test, there is an a priori ordering that gives it a more statistical power than the Kruskal—Wallis test.
The Friedman test is a non-parametric test for testing the difference between several related samples.
The Friedman test is an alternative for repeated measures ANOVAs which is used when the same parameter has been measured under different conditions on the same subjects. Chi-square test, Fischer's exact test and McNemar's test are used to analyse the categorical or nominal variables. The Chi-square test compares the frequencies and tests whether the observed data differ significantly from that of the expected data if there were no differences between groups i.
It is calculated by the sum of the squared difference between observed O and the expected E data or the deviation, d divided by the expected data by the following formula:. A Yates correction factor is used when the sample size is small. Fischer's exact test is used to determine if there are non-random associations between two categorical variables.
Measurement processes that generate statistical data are also subject to error. The first theme of this course is to summarize the observed data. Statistics form a key basis tool in business and manufacturing as well. It is a distribution with an asymmetry of the variables about its mean. The famous Hawthorne study examined changes to the working environment at the Hawthorne plant of the Western Electric Company.
It does not assume random sampling, and instead of referring a calculated statistic to a sampling distribution, it calculates an exact probability. McNemar's test is used for paired nominal data. The null hypothesis is that the paired proportions are equal. The Mantel-Haenszel Chi-square test is a multivariate test as it analyses multiple grouping variables. It stratifies according to the nominated confounding variables and identifies any that affects the primary outcome variable. If the outcome variable is dichotomous, then logistic regression is used.
Numerous statistical software systems are available currently. There are a number of web resources which are related to statistical power analyses. It is important that a researcher knows the concepts of the basic statistical methods used for conduct of a research study. This will help to conduct an appropriately well-designed study leading to valid and reliable results. Inappropriate use of statistical techniques may lead to faulty conclusions, inducing errors and undermining the significance of the article. Bad statistics may lead to bad research, and bad research may lead to unethical practice.
Hence, an adequate knowledge of statistics and the appropriate use of statistical tests are important. An appropriate knowledge about the basic statistical methods will go a long way in improving the research designs and producing quality medical research which can be utilised for formulating the evidence-based guidelines.
This textbook systematically presents fundamental methods of statistical analysis: from probability and statistical distributions, through basic concepts of. Learn the five most important data analysis methods you need in order to interpret your data correctly (and what pitfalls to avoid in the process).
National Center for Biotechnology Information , U. Journal List Indian J Anaesth v. Zulfiqar Ali and S Bala Bhaskar 1. This article has been corrected. See Indian J Anaesth. This article has been cited by other articles in PMC. Abstract Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings.
Basic statistical tools, degree of dispersion, measures of central tendency, parametric tests and non-parametric tests, variables, variance. Open in a separate window. Quantitative variables Quantitative or numerical data are subdivided into discrete and continuous measurements. Table 1 Example of descriptive and inferential statistics. Descriptive statistics The extent to which the observations cluster around a central location is described by the central tendency and the spread towards the extremes is described by the degree of dispersion.
Measures of central tendency The measures of central tendency are mean, median and mode. The variance of a population is defined by the following formula: The variance of a sample is defined by slightly different formula: The SD of a sample is defined by slightly different formula: Table 2 Example of mean, variance, standard deviation.
Normal distribution or Gaussian distribution Most of the biological variables usually cluster around a central value, with symmetrical positive and negative deviations about this point. Skewed distribution It is a distribution with an asymmetry of the variables about its mean. Curves showing negatively skewed and positively skewed distribution. Inferential statistics In inferential statistics, data are analysed from a sample to make inferences in the larger collection of the population.
Table 3 P values with interpretation. Table 4 Illustration for null hypothesis. The assumption of normality which specifies that the means of the sample group are normally distributed. The assumption of equal variance which specifies that the variances of the samples and of their corresponding population are equal.
Parametric tests The parametric tests assume that the data are on a quantitative numerical scale, with a normal distribution of the underlying population. Student's t -test Student's t -test is used to test the null hypothesis that there is no difference between the means of the two groups. It is used in three circumstances: To test if a sample mean as an estimate of a population mean differs significantly from a given population mean this is a one-sample t -test.
The formula for one sample t -test is. To test if the population means estimated by two independent samples differ significantly the unpaired t -test. The formula for unpaired t -test is:. To test if the population means estimated by two dependent samples differ significantly the paired t -test. A usual setting for paired t -test is when measurements are made on the same subjects before and after a treatment.
Non-parametric tests When the assumptions of normality are not met, and the sample means are not normally, distributed parametric tests can lead to erroneous results. Table 5 Analogue of parametric and non-parametric tests. Tests to analyse the categorical data Chi-square test, Fischer's exact test and McNemar's test are used to analyse the categorical or nominal variables.
It is calculated by the sum of the squared difference between observed O and the expected E data or the deviation, d divided by the expected data by the following formula: It gives an output of a complete report on the computer screen which can be cut and paste into another document. Financial support and sponsorship Nil. Conflicts of interest There are no conflicts of interest. Statistics in medical research.
Plural Publishing, Inc; Early exposure to anesthesia and learning disabilities in a population-based birth cohort. Measures of central tendency: Myles PS, Gin T. Statistical Methods for Anaesthesia and Intensive Care. Some basic aspects of statistical methods and sample size determination in health science research. Null hypothesis significance testing: A review of an old and continuing controversy.
A randomized controlled study to compare the effectiveness of intravenous dexmedetomidine with placebo to attenuate the hemodynamic and neuroendocrine responses to fixation of skull pin head holder for craniotomy.