Descriptive Statistics | Definitions, Types, Examples, Importance

$\"Descriptive$

When you spend enough time working with datasets, you\’ll eventually find yourself dealing with statistics. If you were to ask an average person what statistics entail, they\’d likely mention terms like \”numbers,\” \”figures,\” and \”research\” in their quest to explain the term.

Statistics is a science or branch of mathematics that deals with the collection, classification, analysis, interpretation, and presentation of numerical facts and data. It is particularly useful when working with populations that are too large and diverse for precise, comprehensive measurements. Statistics are essential for drawing general conclusions about a dataset from a data sample.

There are two branches of statistics: inferential statistics and descriptive statistics. We will discuss the concept, types, examples, importance, and distinctions between descriptive and inferential statistics as we examine descriptive statistics today.

You can check out the concept of Inferential Statistics here.

What are Descriptive Statistics?

The branch of statistics known as descriptive statistics is concerned with summarizing, organizing, and presenting data clearly and efficiently. Without drawing any conclusions or generalizations to a wider population, it concentrates on summarizing and evaluating the key characteristics and features of a dataset.

Descriptive statistics\’ main objective is to give individuals a clear and concise overview of the data so that analysts and researchers can learn more and recognize trends, patterns, and distributions in the dataset. Measures including central tendency (mean, median, mode), dispersion (range, variance, standard deviation), and distribution shape (skewness, kurtosis) are commonly included in this summary.

Descriptive statistics also includes graphical representations of data such as charts, graphs, and tables, which may help in visualizing and interpreting information. Histograms, bar charts, pie charts, scatter plots, and box plots are examples of commonly used graphical methods.

Researchers can use descriptive statistics to effectively describe and express the essential characteristics of a dataset, allowing for an improved understanding of the data and establishing the foundation for additional statistical analysis or decision-making processes.

Types of Descriptive Statistics and Examples

Descriptive statistics can be classified into different types. Some authors argue that there are two distinct types. However, in this article, we will classify descriptive statistics into three types.

The distribution refers to the frequency of each value.
The measure of central tendency involves average values.

The measure of variability or dispersion is concerned with the spread of values.

$\"Types$

Let us take a look at each type separately.

Frequency distribution

Datasets include a range of scores or values. Statisticians use graphs and tables to summarize how often each possible value of a variable occurs, typically represented as percentages or numerical counts. For instance, if you conducted research to find out the football\’s highest goal scorers in 2023, you\’d have one column with the names of the footballers and another column showing the number of goals each scored in 2023. The table below is an example.

$\"Descriptive$

From the table above, we have summarized the top 10 highest goal scorers in 2023, and at a glance, we can tell who has the highest goal. This is what descriptive statistics help to do. The table above is called the ungrouped frequency table. Let us take an example of a grouped frequency table.

Research example

We want to study how much people love the newly produced cookies in our company. We distributed a survey and asked 50 customers to rate the new cookies on a scale of 1 to 10.

Below are the results obtained from the survey.

1,5,6,3,10,6,5,7,9,6,1,5,6,3,7,9,6,6,5,3,4,2,10,4,6,5,3,4,2,4,5,7,8,7,5,7,8,7,6,5,7,4,2,1,10,7,4,2,1,10.

By merely looking at the data set, it doesn\’t make sense. This is where descriptive statistics come into play.

$\"Descriptive$

Our data set no longer looks overwhelming. As we can see, most of the customers rated the cookies between 4-7. This is a group frequency distribution.

As mentioned earlier, graphs can also be used instead of tables. This is shown below.

$\"Frequency$

As you can see in both charts, results cluster in the center of the chart, forming a bell-shaped curve. This is referred to as a normal distribution in statistics.

While normal distributions are often used in quantitative data analysis, they are by no means the only distribution type. The data might at times lean to the left or right of the chart, or toward the low or high end. The degree of lean in the data is determined by a measure known as skewness. It is vital to consider this while analyzing your data, as it affects the kinds of inferential statistics that may be used in your dataset.

An example of skewed data is given below.

$\"example$

Measures of Central Tendency

Measures of central tendency calculate a dataset\’s average or center using three methods: mean, mode, and median.

The mean is the mathematical average of a group of numbers, which is calculated by dividing the sum of all numbers by the count of all numbers.

The median is the middle number in a group of numbers that are sorted either in ascending or descending order. If there are two numbers in the middle, add them and divide the sum by 2 (N/2).

The mode is the most occurring number in a collection of numbers (in any order). Of course, a dataset could have more than one mode or none at all (i.e., no number appears more than once).

From our sample data set, below are the mean, mode, and median.

$\"Measures$

As you can see, the mean rating for all 50 customers is 5.32. As for the median, it is 5. In other words, if you ranked all the responses from the smallest to the largest, customers 25 and 26 would be in the center (with their ratings being 5 and 5), and our median would be 5+5/2 = 5. Lastly, 5, 6, and 7 are the most frequent ratings (appearing 8 times each), making them the mode.

These three descriptive statistics together provide us with a brief summary of the opinions those customers have about our newly produced cookies. Put another way, the majority of clients feel a little dissatisfied, and there is definitely room for improvement.

Measures of variability

Variability measures indicate how widely spread the response values are. The range, standard deviation, and variance all represent various characteristics of dispersion.

Range

The range indicates how far apart the extreme response scores are. To calculate the range, just subtract the lowest from the highest values.

Standard Deviation

The standard deviation (SD) indicates the average level of variability in your dataset. It indicates the average deviation of each score from the mean. The higher the standard deviation, the more varied the dataset is.

There are six steps to calculate the standard deviation:

List each score and find its mean.
Subtract the mean from each score to get the deviation from the mean.

Square each of these deviations.
Add up all of the squared deviations.
Divide the sum of the squared deviations by N – 1.

Find the square root of the value you get.

Variance

The variance is expressed as the average of squared deviations from the mean. Variance measures the degree of dispersion in a data collection. The more dispersed the data, the greater the variance in proportion to the mean.

To calculate the variance, square the standard deviation. The symbol for variance is s^2.

From our sample data set, below are the range, standard deviation, and variance.

$\"Measures$

As evident, the range of 9 indicates the variation between the maximum rating (10) and the minimum rating (1). The standard deviation of 2.50 informs us that, on average, data points within the dataset deviate by 2.50 from the mean (which is 5.32). This tells us the ratings are spread out quite a bit.

Univariate Descriptive Statistics

Univariate descriptive statistics focus on one variable at a time and do not compare variables. Instead, it allows the researcher to describe specific variables. As a result, this type of statistic is referred to as a descriptive statistic. The following can be used to describe the patterns found in this type of data:

Measures of central tendency (mean, mode, and median)
The measure of dispersion (standard deviation, variance, range, minimum, maximum, and quartiles) (standard deviation, variance, range, minimum, maximum, and quartiles)

Tables of frequency distribution
Pie graphs
Frequency polygon histograms

Bar graphs

Bivariate Descriptive Statistics

Bivariate descriptive statistics involve the simultaneous analysis of two variables to explore their correlation. By convention, the independent variable is represented in the columns, while the dependent variable is represented in the rows. The independent variable is considered the influencing factor, while the dependent variable is the outcome.

For instance, in a study examining the relationship between study hours and exam scores, study hours would be the independent variable represented by the columns, and exam scores would be the dependent variable represented by the rows.

This approach enables researchers to calculate descriptive statistics and measures of association, such as correlation coefficients, scatter plots, and regression analysis. These techniques help quantify the strength and nature of the relationship between the two variables, providing valuable information about patterns and trends in the data.

Difference Between Univariate and Bivariate Statistics

Univariate	Bivariate
Involves only one variable	It involves two variables
Doesn\’t deal with relationships or causes	Deals with causes or relationships
The prime purpose of univariate is to describe: Dispersion: variance, range, standard deviation, quartiles, maximum, minimum Central tendency: mean, median, and mode Bar graph, pie chart, histogram, box-and-whisker plot, line graph	The prime purpose of bivariate is to explain: Correlations: Comparisons, explanations, causes, relationships Dependent and independent variables Tables where just one variable is dependent on other variables\’ values Simultaneous analysis of two variables

Why is Descriptive Statistics Important?

Descriptive statistics, though seemingly straightforward in their mathematical perspective, play a key role in research, demanding careful consideration. It\’s common for students to gloss over these foundational statistics in favor of what appears to be more engaging inferential statistics. However, this tendency can prove to be a costly oversight.

Descriptive statistics are not merely a preliminary step but serve as a tool for researchers. They facilitate a comprehensive understanding of key sample characteristics, preventing researchers from becoming overwhelmed by vast amounts of raw data. These statistics lay the groundwork for subsequent quantitative analyses, offering a quick and insightful glimpse into potential issues within the dataset, such as outliers or missing data.

Furthermore, descriptive statistics play a crucial role in the decision-making process when selecting inferential statistics. Different inferential tests have specific requirements regarding the distribution of data, and a clear understanding of the dataset\’s characteristics is essential in making informed choices.

In essence, a thorough exploration of descriptive statistics is not just a preamble to more sophisticated analyses; it is a critical step in itself. Taking the time to delve into these statistics before venturing into more \”advanced\” techniques is essential. Depending on the research objectives, descriptive statistics may even prove satisfactory, emphasizing the importance of not underestimating their role in the research process.

The Difference Between Descriptive and Inferential Statistics

Descriptive and inferential statistics are two branches of statistical analysis, each serving a distinct purpose. Descriptive statistics help to summarize and describe data in a meaningful way, providing a snapshot of the central tendency, variability, and other characteristics of a dataset. It focuses on organizing and presenting data through measures such as mean, median, mode, range, and standard deviation. By exploring the data’s patterns and distributions, descriptive statistics assist in gaining a better understanding of the information at hand.

On the other hand, inferential statistics go beyond simply summarizing data and aim to make inferences or predictions about a larger population based on a smaller sample. It involves using sample data to draw conclusions about the entire population. Inferential statistics allows researchers to make important decisions based on the information available, such as determining the effectiveness of a treatment or identifying significant relationships between variables. The table below summarizes the difference between Descriptive and Inferential Statistics.

Inferential Statistics	Descriptive Statistics
Inferential statistics employ analytical techniques on sample data to draw conclusions about the population.	Descriptive statistics help to summarize and describe data.
The analytical tools that are employed include regression analysis and hypothesis testing.	The two key instruments that are employed are measurements of dispersion and central tendency.
It is employed to draw conclusions about an unknown population.	It is used to describe the characteristics of a particular sample or population.

Descriptive Statistics | Definitions, Types, Examples, Importance

What are Descriptive Statistics?