Analysis of Linear Log Models on Covid-19 Data in Indonesia

. Covid-19 is still a concern of the world, including Indonesia. The transmission of Covid-19 is very fast and has a wide impact on all people around the world, especially Indonesia. In everyday life, we find a lot of data that looks into a certain category. Categorical analysis of data can be done using the log linear model. The log linear model is used to analyze the relationship between categorical variables that form a contingency table of arbitrary dimensions. The analysis used in this study is to make descriptive statistics and three-way contingency tables, then perform the analysis with the help of SPSS 25.0 software where the goodness of fit test is used to see which models can be used or suitable. The purpose of this study is to analyze a log linear model, so that a log linear model is obtained that is suitable for Covid-19 data based on gender, province, and age group. The conclusion of this study is that of the 9 modexls used, the model (𝑋𝑌, 𝑋𝑍, 𝑌𝑍) is the most suitable model to be used, with a 𝐺 2 value of 18,885 and the equation of the log linear model is 𝑙𝑜𝑔 𝑚 𝑖𝑗𝑘 = 𝜇 + 𝜆 𝑖𝑋𝑌 + 𝜆 𝑗𝑋𝑍 + 𝜆 𝑘𝑌𝑍 , which means that there is a relationship between the two factors for the variables gender and province ( 𝑋𝑌) , gender and age group ( 𝑋𝑍) , and province and age group ( 𝑌𝑍) , in Covid-19 cases in Covid-19 in Indonesia by gender, province, and age group .


INTRODUCTION
Coronavirus Disease or Covid-19 is still a concern for the world, including Indonesia.Covid-19 first appeared in Wuhan, China in December 2019.Covid-19 is caused by a new strain of the Coronavirus, namely Novel Coronavirus 2019 (2019-nCoV) and officially named as Severe Acute Respiratory Syndrome-Coronavirus 2 (SARS-CoV-2) (Bedford et al., 2020).Covid-19 is transmitted through droplets or splashes that come out when someone who is infected coughs, sneezes, or talks (Tian et al., 2020).The transmission of Covid-19 is very fast and has a broad impact on all people around the world, especially Indonesia.
In everyday life, there is a lot of data that is grouped into a certain category.Data that consists of several categories is called categorical data, for example the type of work that is divided into: civil servants and private employees (Lestyorini, 2010).Categorical data is sample observation data in a population that has similar conditions that are cross-grouped into several categorical variables (Fienberg, 2007).Analysis of categorical data is applied in a table that describes the frequency of observations that occur at the level of various combinations of a variable.Tables that apply categorical data are called contingency tables.The contingency table method can answer the relationship between two, three or more research variables but not a causal relationship (Agresti, 2002).Categorical data analysis can be performed using a linear log model.The linear log model is used to analyze the relationship between the categorical variables that make up the contingency table of any dimension.
Various studies on the linear log model have been carried out, as was done by Sari et al. (2016) about the relationship between fuel type, vehicle type, engine compression ratio, and engine capacity, Maryana (2013) about the relationship between gender and education, Sihotang & Zuhri (2020) about the relationship between profession, gender, and type of reading, and others.As for research for the Covid-19 case study, namely research that has been conducted by Ai et al. (2020) to identify Covid-19 based on different age groups (<60 years and 60 years) and gender in China, Zhao et al. (2020) to investigate the relationship between CT scan findings and the clinical condition of Covid-19 pneumonia, Li et al. (2020) who showed a higher distribution of Covid-19 disease for male sex, Lippi & Henry (2020) about the relationship between smoking and the severity of Covid-19, Liu et al. (2020) regarding Covid-19 for elderly patients more likely to develop severe disease, Munayco et al. (2020) classified Covid-19 cases and the number of deaths by age and sex in Peru, and Altun (2021) on the relationship of sex, country, and age group.As seen from the literature review above, Covid-19 is highly dependent on variables such as age, gender, presence of chronic diseases, and country of residence.
Based on the description above, the authors are interested in conducting research on the analysis of linear log models on Covid-19 data in Indonesia based on gender, province, and age group, with the parameters used to evaluate the model, namely the  2 value.

METHOD
The data used in this study is Covid-19 data for 2020 based on gender, province, and age group in Indonesia, sourced from the 2020 Indonesian Health Profile Catalog Book published by the Ministry of Health of the Republic of Indonesia in 2021 (Indonesia, 2021).The gender variable consists of male and female gender, the provincial variable consists of the Provinces of Lampung, South Sumatra and West Sumatra, and the age group variable consists of the age group 0-15 years, 16-30 years, 31-45 years, 46-60 years, and >60 years.These data were then cross-classified in three directions for further analysis using a linear log model.

The linear log model is a model for obtaining a statistical model which states the relationship between variables and qualitative data (nominal or ordinal scale) [17]. A contingency table or what is often called a cross tabulation (cross tabulation or cross classification
) is a table that contains data on the number or frequency or several classifications (categories).The contingency table method can answer the relationship of two or more research variables but not a causal relationship, the more the number of variables tabulated, the better the interpretation (Wulandari et al., 2009).To perform a linear log model analysis, this study will use a three-dimensional contingency table .A three-dimensional table that has ( ×  × ) cells, consisting of  rows,  columns, and  layers, which is then referred to as an i×j×k contingency The independent log linear model for the three variables is as follows [17]: To determine whether the model is appropriate or not, the goodness of fit test will be used.The goodness of fit test can use two test statistics, namely the Chi-Square statistic or the Likelihood Ratio Square.Chi-Square statistics are also used to determine whether or not there is a significant relationship between the variables being measured.The Chi-Square statistic is used to test the hypothesis that the expected population frequency meets a certain model, namely by using the Likelihood Ratio Test ( 2 ) or the Pearson Chi-Square test ( 2 ) (Agresti, 1990) The steps for the analysis of the linear log model used in this study are as follows: 1) Make descriptive statistics with the help of SPSS 25.0 software to find out the characteristics of Covid-19 in Indonesia.2) Create a three-dimensional contingency table with the help of SPSS 25.0 software to measure the relationship (association) between the three variables studied.3) Perform linear log analysis with the help of SPSS 25.0 software to form the best threedimensional linear log model.a) Describes the models that may be formed with three variables, model building starts from the simplest model to the most complete model.b) Conduct goodness of fit tests or model significance tests to see which models can be used or which models are suitable.

RESULT AND DISCUSSION
In this study, a linear log model analysis will be carried out for Covid-19 data for 2020 based on gender, province, and age group in Indonesia.Then the linear log model will be evaluated using the  2 value.The data used in this study are presented in Table 2 as follows.Besides that, based on the Chi-Square test we get that the p-value = .000.Because the pvalue is smaller than the significant level  = 5% (0.005), we reject H0.So we can conclude that at the real level  = 5%there is a relationship between gender, province, and age group.

Goodness of Fit Test
The goodness of fit test was carried out to determine whether the model was significant or not.In this study 9 models will be used, then the best model will be selected to be used.The statistical test used is the likelihood ratio test ( 2 ).It is known that the results of the goodness of fit test with a significance level of  = 0.05 are as follows.

Table 2 .
Covid 19 Data Based on Gender, Province, and Age Group Based on the output obtained, out of 33,584 people infected with Covid-19, the male sex in Lampung and West Sumatra Provinces has the same percentage order for each age group, with the percentage order of the age group infected with Covid-19 from largest to lowest.thesmallest, namely the age group 31-45 years, 46-60 years, >60 years, 16-30 years, and 0-15 years.However, in the province of South Sumatra the age group 0-15 years has a higher percentage compared to the age group 16-30 years.From this we can conclude that the age group of 31 years and over with male sex in the Provinces of Lampung, South Sumatra and West Sumatra are more at risk of being infected with the Covid-19 virus.As for the female sex, the Provinces of Lampung and South Sumatra have the same percentage order for each age group, with the percentage order of the age group infected with Covid-19 from largest to smallest, namely the age group 31-45 years, 46-60 years, > 60 years, 16-30 years, and 0-15 years.However, in the province of West Sumatra the age group >60 years has a smaller percentage compared to the age groups 0-15 years and 16-30 years.From this we can conclude that for ages 31-60 years with female sex in the Provinces of Lampung, South Sumatra and West Sumatra are more at risk of being infected with the Covid-19 virus.

Table 3 .
Comparison of  2 , db, and p-values for Each Model Based on Table 2 above, we get that the  2 and p-values for the model (, , ) are 18,885 and .015.Because the value of  2 is relatively small and the p-value is greater than the significance level  = 5% (0.005), we do not reject H0.So we can conclude that the model (, , ) is a model that fits the model equation (, , ), which is as follows.   =  +    +    +    From this model, it can be interpreted that there is an interaction between the two factors for the variables gender and province (), gender and age group (), and province and age group () in the case of Covid-19 in Indonesia based on gender, province, and age group.