This exercise examines methods of summarizing the relationship between two variables: a simple graphical analysis, the bivariate linear regression model. The application is to the relationship between infant mortality rates (IMRs) and total suspended particulates (TSPs) air pollution. The Environmental Protection Agency recently toughened the regulations that limit firms abil-ity to emit TSPs, because of the presumed health effects of TSPs. Whether or not, IMRs and TSPs are causally related is an issue of tremendous importance to public policy.Feel free to work cooperatively but each person is required to turn in their own problem set that provides the solutions in their own words.For those of you who become interested in this topic, you might be interested in:Chay, Kenneth Y. and Michael Greenstone. 2005. Does Air Quality Matter?: Evidence from the Housing Market. Journal of Political Economy, 113(2): 376-424.Chay, Kenneth Y. and Michael Greenstone. 2003. Air Quality, Infant Mortality, and the Clean Air Act of 1970. MIT Department of Economics Working Paper No. 04-08.Data Source: imrtsp71.dta and imrtsp72.dtaimrtsp71.dta is a data file from 1971. The unit of observation is the county and there are 715observations of 21 variables.This Stata format data file contains county-level information on county-level number of infant mortalities per 1000 births (IMR), the ln of this same number, TSPs concentrations, number of births, characteristics of new parents (e.g. race of mother, years of education, marital status of mother, mothers age), whether the infant is considered to have a low-birth weight (a poor indicator of infant health), month of the pregnancy that the mother initiated prenatal care, and mean per-capita income.The relevant variables with descriptions in quotations are:imr71 “# inf deaths per 1000 births 71”lnimr71 “ln(# inf death per 1000 births 71)”mtspar71 “county-level tsps concentration, measured in micrograms per cubic meter 71” tsp sq “the square of mtspar71”birth71“# births 71”white71 “% births, white mom 71”1othr71 “% births, nonwhite/nonblack mom 71” female71 “% female births 71”edudad71 ‘mean father years of ed 71” edumom71 “mean mother years of ed 71” maried71 “% mother married 71”umard71 “% mother unmarried 71”agemom71 “mean mother age 71”lwght71 “% births with weight <2500 g in 71”pcare171 “% mother began prenatal care in 1st or 2nd month 71” pcare271 “% mother began prenatal care in 3rd month 71” pcare371 “% mother began prenatal care in 4th-6th month 71” pcare471 “% mother began prenatal care in 7th-9th month 71” pcinc71 “county-level per cap income 71”location “5-digit county fips code”fstate “2 digit state fips code”;[Note: There may be a few extra variables in the data file, but you should ignore them.]imrtsp72.dta is structured exactly the same way except that the observations are from 1972 and all the appropriate variable names end with “72” instead of “71”. Again, the unit of observation is the county and here there are 983 observations of 22 variables. DO NOT USE imrtsp72.dta in this problem set.1. Summarize the relationship between the number of infant deaths per 1000 births and TSPs concentrations.Create histograms of imr71 and lnimr71. Do either of these variables look normal? (Hint: experimenting with the number of bins and overlaying a normal curve will help with this.)Graph scatter plots of imr71 and lnimr71 against mtspar71. Does it look like there is an association between infant mortality and tsps?Examine the edudad71 variable. What are the deciles of the variable? What is the average year of education in the largest decile? Graph scatter plot of imr71 and eudad71. Do you think that counties with more educated fathers have lower levels of infant mortality?Graph scatter plots of imr71 and lnimr71 against mtspar71, but this time, weight the observations by the total number of births in the county. What is your prediction about the covariance of infant mortality rates and tsps? Does this relationship appear linear for either form of the dependent variable?2. Background QuestionsDoes the available data allow for a determination of the causal relationship between infant mortality and TSPs? Why not? Describe the data file that would allow for an examination of this issue?Under what assumptions is the least squares estimator the best linear unbiased esti- mator (BLUE)?2c. WhatassumptionisnecessaryforLStoproduceanunbiasedestimateoftheIMR/TSPs relationship? Do you think this assumption is likely to hold? If you had any data file that you wanted, how would you test whether this assumption may be valid? Describe your ideal data file. With the current data file, present some evidence as to whether this assumption is likely to hold?d. In the bivariate linear regression model, derive the estimating equations for the inter- cept and slope coefficients? Derive their standard errors?3. The bivariate linear regression model of infant mortality rates and TSPs.Run the regressions of imr71 on a constant and mtspar71 and lnimr71 on a constant and mtspar71. In both cases, weight the regressions by birth71 so that larger counties have a greater influence. Interpret the parameter estimate (i.e., Beta Hat) in words; for instance, describe the effect of a 10 unit decline in TSPs on infant mortality.Plot the residuals from both regressions and overlay a normal curve. Does the normal- ity assumption appear reasonable? Does homoscedasticity of residuals hold? (Hint: graph residuals against the fitted values)Use the total sum of squares (TSS), error sum of squares (ESS), and regression sum of squares (RSS) to derive the R2 statistic? Determine the components of the corrected R2 statistic and show that STATA accurately calculated that statistic.Determine the values of TSPs that define the deciles of TSPs. Create 10 dummy variables where each one corresponds to a decile of TSPs. For instance, an observation that has a TSPs concentration in the smallest decile would have a value of 1 for the dummy variable that corresponds to the smallest decile and a value of 0 for the other 9 dummy variables. Regress imr71 on a constant and the 10 dummy variables. Why does STATA drop one of the dummy variables? Plot the parameter estimates from the dummy variables where the y-axis is the parameter estimate of the dummy variables and the values on the x-axis are the midpoint of the range that determine each of the dummy variables. Is the effect of TSPs on imr71 linear in TSPs?e Now regress imr71 on mtspar71 and the square of mtspar71. (Note you will have to generate the square variable.) Plot the predicted values of this regression against mtspar71. Describe the shape of this function.