Chekc to Seee if the Following Functions and if They Are Discrete or Continuous

Discrete probability distributions are based on discrete variables, which have a finite or countable number of values. In this post, I show you how to perform goodness-of-fit tests to determine how well your data fit various discrete probability distributions.

How do you recognize discrete distributions? For discrete distributions, you can create a table that contains all possible values and a non-zero probability for each value. The sum of all probabilities must equal 1. In contrast, continuous distributions are based on continuous variables and have an infinite number of possible values.

The following are examples of different types of discrete distributions.

  • Binary: For each customer that enters a dealership, there are two possible outcomes—sale or no sale. Each outcome has a probability.
  • Poisson: The number of cars that a dealership sells in a day can follow the Poisson distribution. You can create a table with the counts (0, 1, 2, 3, etc.) along with the probability for each daily count.
  • Categorical: The color of the car is a categorical variable. You can list all possible colors along with their probabilities.

Each type of discrete probability distribution requires a different type of data and allows you to model different characteristics. Before you can use these distributions, you need to determine whether your data follows one of them.

Before proceeding, I need to clarify that there are two different approaches based on the type of discrete probability distribution you are testing:

  • For binary data, you need to check the assumptions.
  • For other types of discrete variables, you need to perform a goodness-of-fit test.

Related posts: Understanding Probability Distributions and Identify the Distribution of Your Continuous Data

Check the Assumptions for Discrete Distributions Based On Binary Data

If you want to use a discrete probability distribution based on a binary data to model a process, you only need to determine whether your data satisfy the assumptions. You don't need to perform a goodness-of-fit test. If you are confident that your binary data meet the assumptions, you're good to go!

I'll walk you through the assumptions for the binomial distribution. You use the binomial distribution to model the number of times an event occurs within a constant number of trials.

The binomial distribution has the following four assumptions:

  1. There are only two possible outcomes per trial. For example, yes or no, pass or fail, etc.
  2. Each trial is independent. The outcome of one trial does not influence the outcome of another trial. For example, when you flip a coin, the result of one flip doesn't affect the next flip.
  3. The probability remains constant over time. For some cases, this assumption is true based on the physical properties, such as flipping a coin. However, if there is a chance the probability can change over time, you can use the P chart (a control chart) to confirm this assumption. For example, it's possible that the probability that a process produces defective products can change over time.
  4. The number of trials is fixed. The binomial distribution models the frequency of events over a fixed number of trials. If you need to model a different characteristic, use a different distribution.

Typically, you must have good knowledge about the process, data collection methodology, and your goals to determine whether you should use the binomial distribution. If you can meet all four of these assumptions, you can use the binomial distribution. Statisticians refer to the 2nd and 3rd assumptions as independent and identically distributed (IID) data.

Other distributions that use binary data

There are several other discrete distributions that use binary data. I list several of them below along with how they differ from the binomial distribution. Each distribution has assumptions or goals that vary a bit from the binomial distribution.

Distribution Main differentiation from the binomial distribution
Negative binomial Models the number of trials to produce a fixed number of events.
Geometric Models the number of trials to produce the first event.
Hypergeometric Assumes that you are drawing samples from a small population with no replacements, which causes the probabilities to change.

If you are working with binary variables, the choice of binary distribution depends on the population, constancy of the probability, and your goals. When you confirm the assumptions, there typically is no need to perform a goodness-of-fit test.

Example Use of the Binomial Distribution

In a future post, I will show you ways you can use the various discrete probability distributions for binary data. For now, I'll include an example use of only the binomial distribution to give you an idea. The graph below shows us that if the probability of a defective product is 1.5% and you are modeling a sample size of 30, you'd expect just over 60% of the samples to have zero defective products. Additionally, the binomial distribution predicts that about 7.4% of the samples will have two or more defective products.

Example use of the binomial distribution, which is one of the discrete distributions.

Related post: Learn more about the various discrete probability distributions for binary data

Performing a Goodness-of-Fit Test for other Discrete Distributions

If you are working with discrete data that are not binary data, chances are you'll need to perform a Chi-square goodness-of-fit test to decide if your data fit a particular discrete probability distribution. These tests compare the theoretical frequencies to the frequencies of the observed values. If the difference is statistically significant, you can conclude that your data do not follow that specific discrete distribution.

Like any statistical hypothesis test, Chi-square goodness-of-fit tests have a null hypothesis and an alternative hypothesis.

  • H0: The sample data follow the hypothesized distribution.
  • H1: The sample data do not follow the hypothesized distribution.

For goodness-of-fit tests, small p-values indicate that you can reject the null hypothesis and conclude that your data were not drawn from a population with the specified distribution. Consequently, goodness-of-fit tests are a rare case where you look for high p-values to identify candidate distributions.

I'll show you how to test whether your discrete data follow the Poisson distribution and a distribution based on a categorical variable. You can download the CSV file that contains the data for both examples: DiscreteGOF.

Testing the Goodness-of-Fit for a Poisson Distribution

The Poisson distribution is a discrete probability distribution that models the count of events or characteristics over a constant observation space. Values must be integers that are greater than or equal to zero. For example, the number of sales per day in a store can follow the Poisson distribution. If these data follow the Poisson distribution, you can use this distribution to make predictions.

I'll use an accident count example to show you how to determine whether your data follow the Poisson distribution.

Suppose a safety inspector needs to monitor the number of car accidents per month at a specific intersection. The inspector enters the number of monthly accidents in a worksheet like this:

Example worksheet that contains the number of accidents per month in each cell.

Each value denotes the count of accidents in one month. The actual dataset has 50 values that cover 50 months.

To determine whether these data follow the Poisson distribution, we need to use the Chi-Squared Goodness-of-Fit Test for the Poisson distribution. The statistical output for this test is below.

The statisticals results for the goodness-of-fit test for the Poisson distribution.

This test compares the observed counts to the expected counts based on the Poisson distribution. The p-value is larger than the common significance level of 0.05. Consequently, the test result suggests that these data follow the Poisson distribution. You can use the Poisson distribution to make predictions about the probabilities associated with different counts. You can also use analyses that assume the data follow the Poisson distribution. These analyses include the 1- and 2-sample Poisson rate analyses, and the U Chart.

Related post: Using the Poisson Distribution

Categorical Variables and Discrete Distributions

A categorical variable also has a discrete probability distribution of values. Each level of the categorical variable is associated with a probability. To determine whether the distribution of categorical data follows the values that you expect, you can perform the Chi-Square Goodness-of-Fit test. This test is very similar to the Poisson version except that you must specify the test proportions.

I'll walk you through an example. It is fairly easy to perform this test.

Car color example of a discrete distribution

PPG Industries studied the paint color of new cars bought in 2012 for the entire world. We want to assess whether the distribution of car colors in our local area follows the global distribution. In this example, the PPG data are real, but I'm making up our local data. The car color is our categorical variable and the levels are the individual colors.

After gathering a random sample of the color of cars sold in our state, we enter the observed data and global proportions in a worksheet like this:

Worksheet that contains the data for discrete distribution of car colors.

The OurState column contains the tally for each color that we observed. The PPG Industries data are in the Global Proportions column. We'll perform the Chi-square goodness-of-fit test to determine whether our local distribution is different than the global distribution. We'll use the PPG proportions as the test proportions.

This table displays both a frequency distribution and relative frequency distribution. For more information, read my post, Relative Frequencies and Their Distributions.

The Chi-square goodness-of-fit test results

Chi-squared goodness of fit test results for the discrete distribution of car colors.

This goodness-of-fit test compares the observed proportions to the test proportions to see if the differences are statistically significant. The p-value is less than the significance level of 0.05. Therefore, we can conclude that the discrete probability distribution of car colors in our state is differs from the global proportions.

The Contribution to Chi-squared column tells us which paint colors contribute the most to the statistical significance. Gray and Red are the top two colors, but we don't know the nature of how they contribute to the difference.

Let's look at the observed and expected values chart to see how these values are different.

Chart of observed and expected values for the discrete distribution of car colors.

The chart indicates that the observed number of gray cars is higher than expected. On the other hand, the observed number of red cars is less than expected.

You can use this analysis to test Benford's law, which is a fascinating discrete probability distribution that describes how often numbers in datasets start with each digit from 1 to 9. Learn more about Benford's law and its distribution.

Read my more in-depth post about the chi-square goodness of fit test.

Related post: Learn how the Chi-squared test of independence establishes whether there is a statistically significant relationship between categorical variables.

Closing Thoughts

There are several different types of discrete variables than can produce different types of discrete probability distributions. The process by which you test your data to determine whether it follows a specific distribution depends on the type of discrete variable.

In summary:

  • Binary data: Check the assumptions.
  • Count data: Use the Poisson Goodness-of-Fit Test.
  • Categorical variable: Use the Chi-Square Goodness-of-Fit Test and designate the test proportions.

carusolithen.blogspot.com

Source: https://statisticsbyjim.com/hypothesis-testing/goodness-fit-tests-discrete-distributions/

0 Response to "Chekc to Seee if the Following Functions and if They Are Discrete or Continuous"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel