Undergraduate students’ errors on interval estimation based on variance neglect

Interval estimation is an important topic, especially in drawing conclusions on an event. Mathematics education students must possess the skill to formulate and use interval estimation. The errors of mathematics education students in formulating wrong interval estimates indicate a low understanding of interval estimation. This study explores the errors of mathematics education students in interpreting the variance in the questions regarding selecting the proper test statistic to formulate the interval estimation of mean accurately. Respondents in this study involved 36 students of mathematics education (N = 9 males, N = 27 females). This research is qualitative research with a qualitative descriptive approach. Data collection was carried out using the respondents’ ability test and interviews. The respondents’ ability test instrument was tested on 36 students and declared valid where r-count > r-table with r-table of 0.3291, and declared reliable with a Cronbach Alpha value of 0.876 > 0.6. Through an exploratory approach, data were analyzed by categorizing, reducing, and interpreting to conclude students' abilities and thinking methods in formulating interval estimation of the mean based on the variance in questions. The results showed that mathematics education students neglected the variance, so they could not determine the test statistics correctly, resulting in error interval estimates. This study provides insight into the thinking methods of mathematics education students on variance in interval estimation problems in the hope of anticipating errors in formulating interval estimation problems.


Introduction
Many fields apply statistics, such as education, politics, industry, technology, research, etc. (Ulpah, 2009). The estimation theory has a crucial role in statistics because estimation and hypothetical tests are the basis of inferential statistics (Budiarto, 2002). Interval estimation is applicable for most analyses and is helpful to avoid misinterpretation from a minor-scale and insignificant research (Altman, 2005). Interval estimation is helpful to interpret data, especially to consider interval estimation as a predictive interval. This interval provides information about replications (Cumming & Fidler, 2009). The estimation theory could make time sufficient to decide (Jarret, 2011). Many scholars propose interval estimation as a beneficial alternative to make a decision. They also recommend implementing the APA manual (Hoekstra et al., 2014). Some previous studies by Strehl and Littman (2008), Raupong et al. (2015), and Damanik and Simamora (2019) discuss the implementation of interval estimation in deciding for a specific study.
Based on the previous studies, studying interval estimation for daily lives is essential. Therefore, the implementation of interval estimation should avoid errors. There are three common errors: misconception, incorrect instruction, and information selection errors (Murtiyasa & Perwita, 2020). Some scholars, such as Fieller (1954), Kalinowski (2010), and Hoekstra et al. (2012), investigated misconceptions of interval estimations. However, the scholars did not discuss errors in understanding and cognitive patterns based on variances toward correct statistic test selection to formulate correct interval estimation.
There are two groups of estimation, covering point and interval estimations. Interval estimation is an expansion of point estimation. Cahyono (2018) explains that interval estimation is an interval value of a statistics sample. This sample contains possibilities and whole parametric values between lower and upper intervals. The expansion range in interval estimation receives three factors: sample size, confidence, and population variability measured from deviation standards (Cahyono, 2018). Therefore, variance becomes an influential factor of interval estimation.
Works of literature show some interval estimations, such as interval estimation of mean, interval estimation of proportion, and interval estimation of variance (Budiyono, 2016). In this research, the researchers discuss only interval estimation of mean. This type of estimation becomes the primary material in the mathematical statistics course of the Muhammadiyah University of Surakarta. The Interval estimation of the mean has two groups. They are interval estimation of the mean using one-population and interval estimation of the mean using two-different populations (applied to find the mean difference from 2 treatments).
Therefore, this research answered some questions: How are the skills of mathematics teacher candidates to solve mean interval estimation? How are they thinking patterns of the students to formulate the interval estimation based on variances in a problem? This research explores the mathematics education students' errors in defining variances toward accurate statistics test choices to formulate mean interval estimation accurately.

Methods
This research was qualitative research with a qualitative descriptive approach because the researchers are eager to describe the fact ability or circumstances that appear in the ability and errors of students in interval estimation by neglecting variance. The research was conducted on 36 students (9 males and 27 females) Mathematics Education Study Program class VI A of Universitas Muhammadiyah Surakarta. The respondents were chosen because interval estimation theory is one of the materials learned in statistical inference explicitly designated for students in the sixth semester. The research subjects were selected by applying stratified random sampling techniques (Kadilar & Cingi, 2005). The advantage of the stratified random sampling technique is that it can represent each stratum or layer in the required population (Acharya et al., 2013).
Data were gathered through respondent's ability tests and interviews. The respondent's ability test was employed to determine the student's ability to formulate interval estimates. This test comprises two questions with indicators; formulate interval estimation of the single population mean (number 1) and formulate interval estimation of two populations mean (number 2). Validity and reliability tests were conducted on the respondent's ability test using IBM Statistic 24 software. Pearson's product-moment was administered in the validity test with a significance level of 5% in the distribution of the statistical r-table value, and it obtained r-table = 0.396. The validity test results of the respondent's ability test acquired on each item are 0.908 and 0.834. Thus, the respondent's ability test instrument questions are deemed valid (Taherdoost, 2016). The reliability test used is Cronbach's Alpha with a limit of 0.6 in decision making. The instrument is declared reliable if the Cronbach's Alpha value is > 0.6 (Ahmad et al., 2016). The analysis results of the respondent's ability test show that the Cronbach's Alpha value was 0.876 > 0.6, meaning that the respondent's ability test is reliable.
Interviews were conducted by focusing on students' thinking methods (Hobri et al., 2020) in formulating interval estimation based on the known variance in the questions. Interviews were used in the interview guidelines (Taqiyuddin et al., 2016). In this interview, participants were asked to explain their strategies and ways of thinking to solve the given interval estimation problem. The purpose of this interview was to test their way of thinking and understanding of interval estimation. To ensure the validity of the data, triangulation techniques were done, including observation, in-depth interviews, and documentation. These three stages were done to obtain data from different sources with the same technique. The observation was used to observe and compare student test results through analysis, which was then followed by in-depth interviews about the results of student answers. Documentation was required for data sources in the form of questions on the respondent's ability test and pictures of answers as proof that the researchers conducted the actual research.
Furthermore, the data were analyzed using an exploratory approach (Syamsuddin, 2020). Data were analyzed by categorizing, reducing, and interpreting to conclude describing students' abilities and thinking methods in formulating interval estimation of the mean based on variance. To condense and simplify understanding in writing, the researchers used the R symbol for researchers and S1, S2, S3, S4, and S5 symbols for samples. Drawing conclusions was carried out after collecting related data that had previously been processed in such away.

Results
This research was conducted by providing a test of respondents' ability with interval estimation material. The problems given to students in the respondent's ability test can be seen in Table 1 below. The study results showed that the height of the Mathematics Education students was normally distributed with a standard deviation of 6.4 cm. If 15 students were taken and the average height was 161 cm, then make a 98% confidence interval for the average height of all mathematics education students. 2 In order to support the implementation of the new curriculum, especially mathematics subjects for junior high school, research was carried out by applying a scientific approach with PBL and discovery learning models. Two classes were taken, class VII A which was taught using a scientific approach with the Problem Based Learning model, and class VII B, which was taught using a scientific approach with the Discovery learning model. The final exam questions given for both classes were similar. The results showed that class A, consisting of 15 students, achieved an average math score of 81 with a standard deviation of 6.5. Class B, consisting of 14 students, achieved an average score of 76 with a standard deviation of 7. So, make a 90% confidence interval for the difference in the average score of Class A and Class B, assuming the two groups had the same variance! The answers to the respondent's ability test were assessed based on the assessment guidelines (Rosid & Listiyani, 2014; Purnamasari & Setiawan, 2019), which were processed using Excel 2010 software. From 36 students, the mean respondent's ability test was 63.64, with a standard deviation of 12.12. The data were then utilized to group scores of respondents' ability tests into five categories based on Sudijono's theory (2014). Each of these categories is described in the following Table 2. Table 2. Stratification of student scores on the respondent's ability test There were two types of mathematics education student responses on the respondent's ability test given the variance problems. The responses of 36 mathematics education students are presented in Table 3 and Table 4.  From 36 participants, only 2 of them (5.6%) were able to formulate interval estimation with test statistics correctly on the respondent's ability test indicator number 1. While on the respondent's ability test indicator number 2, 34 participants (94.4%) were able to formulate interval estimation with test statistics appropriately. Mathematics education students used known variances correctly in test statistics with T-scores and Z-scores. For errors encountered by mathematics education students to be visible, and analysis on the thinking stage was carried out. This analysis was performed on all responses, both true and false (Tamba et al., 2021). From 36 students, the researchers sampled five subjects for interview. Five students were selected, and each was taken from five categories on the respondent's ability test scores. Based on Table 3 and Table 4, responses from five students are described in Table 5 below. Realizing the data variance types in the questions and considerations Table 3 implies that 38.9% of mathematics education students acknowledged the type of data variance in the first problem to be considered in selecting test statistics to formulate interval estimation. Variance should be considered in interval estimation problems. Thus, the interval estimation is formulated with a suitable statistical test. Since the known variance was population data, respondent number 1 performed Z-test statistics to formulate the ability test's interval estimation. However, despite the students using the correct test statistics, only 5.6% out of 36 students formulated the interval estimation correctly in the first problem. They used interpolation in determining the Z-score in the distribution table, including what was conducted by S1 (see Figure 1).

.3267
The following is an excerpt of the interview between researchers (R) with the subject (S1) about answer number 1.
R : How did you solve problem number 1? S1 : After finding that the data is normally distributed, the first step in making an interval estimation is to pay attention to the type of data variance in the problem. This is to determine whether the variance is population data or sample data. By understanding the type of known variance, it can be used as a basis for selecting the distribution test. Because the variance in the problem is population data, I used the z-table in the distribution test to determine the point of significance. R : I saw you used the interpolation. What was your reason to do so? S1 : Oh well... I think by using interpolation to determine points, it will be more accurate. R : Why didn't you use difference-based approach? S1 : If you use a difference-based approach, it will be error. Although the difference is only 0 comma when compared to using interpolation, this will affect the interval estimation results.
The results of S1 in Figure 1 represent that the subject understood the interval estimation material. The subject could explain coherently the reasons for choosing the test statistics and considering variance. Furthermore, S1 used interpolation in determining 2 point.
Meanwhile, 33.3% of other mathematics education students preferred not to interpolate on the respondent's ability test number 1. They decided the Z-score by rounding the known probability values to the point in the table, which has the smallest difference (as shown in Figure 2).

Figure 2. Response of S2 in respondent's ability test number 1 without interpolation
The following is an excerpt of the interview between researchers (R) with the subject (S2) about answer number 1.
R : How did you solve problem number 1? S2 : Because the data are normally distributed, I immediately paid attention to the type of variance in the data to find out the formula and determine the test statistics. R : Why didn't you use interpolation to determine 2 ? S2 : I didn't understand how to apply interpolation in determining the Z-score, so I only used rounding to the closest point between 2 points in the Z-table.
The results of S2 in Figure 2 represent that the subject understood the interval estimation material. The subject could explain confidently the reasons for choosing the test statistics used. However, S2 did not use interpolation in determining the point 2 . S2 preferred the rounding method to interpolation. This leads to less valid interval estimation. Because by using the rounding process in determining the point 2 , the points obtained will be less accurate and affect the calculation when formulating interval estimates. Figure 3 shows the comparison of the final results between the answers of S1 and S2 on the respondent's ability test number 1.
(Final result from S1's respon) (Final result from S2's respon) Figure 3. Comparison of the final results in respondent's ability test number In contrast to the first problem, Table 4 shows that 94.4% of mathematics education students answered the second problem correctly. As shown in Figure 4, they acknowledged the data variance and considered it to choose the suitable test statistics. Because the variance in the respondent's ability test number 2 is sample data, it used T-test statistics. The following is an excerpt of the interview between researchers (R) and subject (S3) about answer number 2. R : How did you solve problem number 2? S3 : First, I looked at how the data variance in the problem is. Because in the problem, it is assumed that the two groups have the same variance, the population variance in problem number 2 is unknown. So, in problem number 2, I decided to use the T-test statistics.
The results of S3 in Figure 4 show that the subject was able to formulate the interval estimation of two populations mean correctly. The subject was able to analyze the variance in the problem adequately and could explain why the subject chose such test statistics. Table 3 shows that 61.1% of mathematics education students completed the interval estimation incorrectly in problem number 1. They did not choose the correct test statistics, which led to error interval estimation in problem number 1. Using the T-test statistics is unsuitable if applied to the respondent's ability test number 1. Figure 5 shows the answers of respondents S5 to test number 1. The following is an excerpt from the interview between researchers (R) and subject (S5) about answer number 1.

Neglecting the variance types in the questions
R : How did you solve problem number 1? S5 : In my opinion, if the interval estimation is a single population, then the test used is test statistics with a T-distribution test. Meanwhile, if there are two populations or the difference between two means, then the test statistics used is the F-distribution test. The results of S5 in Figure 5 and the interview conducted a project that the subject could not formulate the interval estimation of mean correctly. The subject could not explain the reasons for choosing the test statistics and neglected the variance in the problem.
Meanwhile, in the second problem, 5.6% of participants answered incorrectly. Based on the problems, mathematics education students preferred to use statistical tests with Z-  The results of S4 in Figure 6 and the interview show that the subject could not formulate the interval estimation of mean correctly. The subject could not explain the reasons for choosing the test statistics used and neglected the variance problem.

Discussion
Based on the obtained test scores of the respondents, only one student reached the high category. Then, four students reached the high category out of 36 students. The data proved low task mastery of the students. It meant many students could not formulate interval estimation correctly. This finding indicated some misinterpretation about interval estimation (Kalinowski, 2010). The students had a misconception about interval estimation theory. It made them inaccurately determine the applied statistical test to formulate interval estimation. Murtiyasa and Wulandari (2020) also found the cause of the transformational error. It was the failure to understand a material concept.
In this research, S5 inaccurately answered both questions. The subject's work showed incorrect steps to choose the applied statistical test. After being interviewed, the researchers found that the subject did not understand the concept and did not analyze the problem. This attitude goes against the basic concept of statistics. Statistics includes collecting, displaying, analyzing, and interpreting various data types (Williams, 2007). It means a problem requires data analysis to conclude correctly. Supena et al. (2021) explain that an inferential statistical problem requires data analysis to draw an accurate conclusion. The implication of the interval estimation problem is -starting the process from the context, such as variance. If a student neglects the variance, determining applicable statistics tests will be inaccurate. Thus, the results will influence the obtained interval estimation and cause inaccuracy while concluding.
S3 and S4 could answer one of the interval estimation problems correctly. The subjects' works showed incorrect steps to choose the applied statistical test. However, both subjects did not realize the logical reason to select the applied statistical test based on the thinking patterns. Although the students applied the correct formula, they did not know the logical reason to apply it. This lack of understanding made the subject incorrectly solve different problems' conditions. Although the applied formula was correct, the students paid attention to the variance type without thinking about the reasons and neglecting the reasons. Lusiana (2017) also found the importance of an accurate plan to solve problems. The subjects should realize the logical reasons to create an accurate plan. Thus, they could solve the problems.
However, solving a statistical problem does not only focus on observing data. The process of analyzing the data should also gain attention. This attitude was observable on S1 and S2 while solving the first question. However, S1 and S2 paid attention to the data. However, they had different ways to process the data, especially to determine the value of 2 . S1 determined the value of 2 with interpolation. On the other hand, S2 determined the value of 2 by rounding up the closest point found in the Z-distribution table.
The explanation shows that neglecting variance is the cause of students' incapabilities to formulate accurate interval estimation. Readers and students can understand the current discussion results by remembering existing limitations. First, there are many interval estimation types. The given test instruments had a limitation on mean-interval estimation. Thus, further studies with broader scopes are essential for each interval estimation type. Secondly, the researchers did not analyze the internal and external factors of students' errors in this research. Therefore, further studies should include internal and external factors so that students and lecturers can prevent similar errors.

Conclusion
The mathematics education undergraduate students' skills in formulating interval estimation had a low category. The results showed that students could not formulate interval estimation accurately. One of the inaccuracies caused to formulate interval estimation was -lack of problem understanding. Moreover, the students also neglected the variance. The students inaccurately determine the statistical test. Thus, they should have understood and paid attention to the variance.
The current research limitations dealt with the applicable test instrument in meaninterval estimation material analysis. The researchers recommend further studies about this material with different difficulty levels, broader problem forms, and internal and external factors of errors. It especially deals with students' understanding and skills on each interval estimation condition.

Conflicts of Interest
All researchers actively did the job substances of the current article. They also take full responsibility in terms of the content. The researchers state no conflicts of interest about this manuscript publication. Then, the researchers will take full responsibility if the ethical problems, such as fabrications, fraudulences, plagiarisms, and copyright violations upon the data and content, multiple submissions or publications, and redundancies occur.