Home > Configuring and maintaining GainSeeker > Configuring GainSeeker > Configurations and System-wide settings > Changing the settings for a Configuration > Statistics > SPC Statistics

SPC Statistics

You have several choices for SPC statistical settings.

You can decide whether to include out-of-control data in your retrievals and statistical analyses. If you choose to exclude statistical outliers, the data points that violate SPC rules will display as outliers on control charts and on data tables, but will be excluded from statistical calculations.

You can also select the methods to use for calculating standard deviations for subgroup calculations and individual calculations.

Method for calculating control limits

Choose a standard deviation method for calculating control limits. You can choose either Factors (R-Bar / d2) or Sample standard deviation. The Factors method provides the best prediction if your data exhibits more variation within subgroups than between subgroups, while the Sample standard deviation method provides the best prediction if your data exhibits more variation between subgroups than within subgroups.

Method for calculating standard deviation

Choose a method for calculating standard deviation for other statistics. You can choose either Factors (R-Bar / d2) or Sample standard deviation. The Factors method provides the best prediction if your data exhibits more variation within subgroups than between subgroups, while the Sample standard deviation method provides the best prediction if your data exhibits more variation between subgroups than within subgroups.

The method you choose will be used to calculate the Mean +/- 3SD, Mean +/- 4SD, 6SD, Zl, Zu, and Cpm for data in a normal distribution.

Non-normal

To configure default settings for how SPC should work with data that is non-normally distributed, expand the Non-normal tree.

Calculate non-symmetrical control limits for subgroup size = 1

GainSeeker can calculate non-symmetrical control limits if the subgroup size is one and the distribution is non-normal but can be modeled. For more information on non-symmetrical control limits, see Non-symmetrical control limits.

Evaluate more data sets as non-normal

Select whether to use a 95% confidence level or a 99% confidence level when analyzing non-normal data. At 95% confidence level, the program treats more data as non-normal than the default of 99%.

Force distribution

To view or change this setting, expand the Force distribution tree. Then select an option for analyzing the normality of a data distribution:

Automatic - This allows GainSeeker to test the data set for normality. GainSeeker will apply the appropriate statistical analysis to the data set, based on whether its distribution is normal or non-normal.
Non-normal - This forces GainSeeker to analyze each data set as a non-normal distribution, even if the data is in fact normally distributed. This gives the most accurate calculation of the true shape of the curve.
Normal - This forces GainSeeker to analyze each data set as a normal distribution, even if the data is in fact non-normally distributed. This makes the statistical analysis easy to replicate by hand, but it does not always provide a true representation of your process because you ignore the true distribution of the data.

Limit number of slices for non-normal area calculation

You can choose unlimited non-normal slices or limited non-normal slices at 2048. The larger the number of slices, the longer the program will take in attempting to calculate Non-normal area for unusual distributions.

Use the median method to calculate Z values instead of probability method

You can choose to have the program calculate Z values from the Median or from a table (Probability method).

Statistical outliers

To view or change this setting, expand the Statistical outliers tree.

Statistical outliers are observations that violate SPC rules.

If you choose to exclude statistical outliers, the out-of-control data points will be displayed as outliers on control charts and on data tables but will be excluded from the statistical calculations.

To include all data points in the statistical calculations, choose All data included.
To exclude from the statistical calculations all data points that fall outside the calculated control limits, choose Data outside Control limits excluded.
To exclude from the statistical calculations all data points that fall outside the calculated control limits as well as data that violates SPC rules for runs, trends or zones, choose Out of Control, runs, trends and zone violations excluded.

Note about excluding outliers and occasional oscillation cycle occurrences:

When you choose to exclude outliers, GainSeeker loops through the data at least twice to determine if excluding the outliers will change the control limits and push other data points outside the new control limits, creating new outliers. Typically, the outlier loop is only executed a few times, however, there are times when the outliers endlessly oscillate in and out of control limits. This can happen when control limits change as data points are excluded/included on subsequent passes through the data set. Thus, GainSeeker is limited to a maximum ten passes when such an oscillation cycle occurs.

Varying subgroup size

To configure the settings for incomplete subgroups and varying subgroup sizes, right-click the Varying subgroup size node on this tree. This will display the Incomplete Subgroups and Varying Subgroup Sizes screen:

Each time you retrieve data for a standard, GainSeeker will count the number of measurements stored for each subgroup in the retrieval. If you retrieved data with incomplete subgroups or multiple subgroup sizes, the number of measurements per subgroup will not be identical for all subgroups in the retrieval, and GainSeeker will use the settings on this screen to determine how statistics will be calculated on this data:

Exclude all subgroups that do not match the subgroup size of the latest record

With this setting, GainSeeker will count the number of measurements stored for the most recent subgroup in the retrieval. Any subgroups that contain a different number of measurements will be displayed on charts and data tables but temporarily excluded from statistical calculations.

If you select this check box, GainSeeker will ignore all other settings on this screen and make them unavailable.

Exclude subgroups with different subgroup sizes if more than n percent of the subgroups match the subgroup size of the latest record

With this setting, GainSeeker will count the number of measurements stored for the most recent subgroup in the retrieval. It will then analyze the percentage of subgroups in the retrieval that contain this number of measurements:

If more than n percent of the subgroups in this retrieval also contain this number of measurements, then any subgroups that do not contain this number of measurements will be displayed on charts and data tables but temporarily excluded from statistical calculations.
If the number of subgroups in this retrieval is less than or equal to the percentage you specify here, GainSeeker will not use this setting to exclude data from statistical calculations. Instead, it will proceed according to the options you have configured for settings 3 and 4 below.

If subgroups matching the latest subgroup size provide more than n degrees of freedom, exclude the subgroups with different subgroup sizes

The "Degrees of Freedom" statistic is based on the number of data values being analyzed and the number of subgroups being analyzed. It helps you determine whether you are including enough data values and subgroups to provide a valid statistical sample and is calculated as follows:

DFi = Degrees of Freedom for subgroup number i

For a subgroup with n data values (where n > 1): DFi = n - 1
For a subgroup with 1 data value: DFi = 1
For a subgroup with 0 data values: DFi = 0

DF = Degrees of Freedom for the subgroups being analyzed

k = the number of subgroups being analyzed

DF = DF1 + DF2 + ... + DFk

With this setting, GainSeeker will count the number of measurements stored for the most recent subgroup in the retrieval. It will then calculate the degrees of freedom for subgroups in the retrieval that contain this number of measurements:

If the subgroups containing this number of measurements provide more than n degrees of freedom, then any subgroups that do not contain this number of measurements will be displayed on charts and data tables but temporarily excluded from statistical calculations.
If the subgroups containing this number of measurements do not provide more than n degrees of freedom, GainSeeker will not use this setting to exclude data from statistical calculations. Instead, it will proceed according to the options you have configured for setting 4 below.

Standardize data or show actual values

With these two options, no subgroups are excluded from statistical calculations:

Apply the Standardized coding method

If you choose this setting, GainSeeker will use the Standardized coding method to display modified data values on charts and use these modified values to calculate statistics such as control limits.

Display the actual values retrieved

If you choose this setting, GainSeeker will display the actual measurements and subgroup averages on charts.

Because you are analyzing data with different numbers of measurements in the subgroup without using a coding method to accommodate the difference in subgroup sizes, GainSeeker will not be able to calculate some statistics – like Cpk or control limits – for the entire retrieval. However, if you select the Add data grouping by subgroup size check box, then GainSeeker will begin a new data group each time the subgroup size changes. This will allow GainSeeker to calculate statistics such as the data mean and control limits separately for each data group.