A Data Scientist's blog: Tuesday, November 12, 2013

Tuesday, November 12, 2013

Interpreting Regression Output

Introduction

This guide assumes that you have at least a little familiarity with the concepts of linear multiple regression, and are capable of performing a regression in some software package such as Stata, SPSS or Excel. You may wish to read our companion page Introduction to Regression first. For assistance in performing regression in particular software packages, there are some resources at UCLA Statistical Computing Portal .

Brief review of regression

Remember that regression analysis is used to produce an equation that will predict a dependent variable using one or more independent variables. This equation has the form

Y = b1X1 + b2X2 + ... + A

where Y is the dependent variable you are trying to predict, X1, X2 and so on are the independent variables you are using to predict it, b1, b2 and so on are the coefficients or multipliers that describe the size of the effect the independent variables are having on your dependent variable Y, and A is the value Y is predicted to have when all the independent variables are equal to zero.

In the Stata regression shown below, the prediction equation is price = -294.1955 (mpg) + 1767.292 (foreign) + 11905.42 - telling you that price is predicted to increase 1767.292 when the foreign variable goes up by one, decrease by 294.1955 when mpg goes up by one, and is predicted to be 11905.42 when both mpg and foreign are zero.

Coming up with a prediction equation like this is only a useful exercise if the independent variables in your dataset have some correlation with your dependent variable. So in addition to the prediction components of your equation--the coefficients on your independent variables (betas) and the constant (alpha)--you need some measure to tell you how strongly each independent variable is associated with your dependent variable.

When running your regression, you are trying to discover whether the coefficients on your independent variables are really different from 0 (so the independent variables are having a genuine effect on your dependent variable) or if alternatively any apparent differences from 0 are just due to random chance. The null (default) hypothesis is always that each independent variable is having absolutely no effect (has a coefficient of 0) and you are looking for a reason to reject this theory.

P, t and standard error

The t statistic is the coefficient divided by its standard error. The standard error is an estimate of thestandard deviation of the coefficient, the amount it varies across cases. It can be thought of as a measure of the precision with which the regression coefficient is measured. If a coefficient is large compared to its standard error, then it is probably different from 0.

How large is large? Your regression software compares the t statistic on your variable with values in the Student's t distribution to determine the P value, which is the number that you really need to be looking at.The Student's t distribution describes how the mean of a sample with a certain number of observations (your n) is expected to behave. For more information on the t distribution, look at this web page .

If 95% of the t distribution is closer to the mean than the t-value on the coefficient you are looking at, then you have a P value of 5%. This is also reffered to a significance level of 5%. The P value is the probability of seeing a result as extreme as the one you are getting (a t value as large as yours) in a collection of random data in which the variable had no effect. A P of 5% or less is the generally accepted point at which to reject the null hypothesis. With a P value of 5% (or .05) there is only a 5% chance that results you are seeing would have come up in a random distribution, so you can say with a 95% probability of being correct that the variable is having some effect, assuming your model is specified correctly.

The 95% confidence interval for your coefficients shown by many regression packages gives you the same information. You can be 95% confident that the real, underlying value of the coefficient that you are estimating falls somewhere in that 95% confidence interval, so if the interval does not contain 0, your P value will be .05 or less.

Note that the size of the P value for a coefficient says nothing about the size of the effect that variable is having on your dependent variable - it is possible to have a highly significant result (very small P-value) for a miniscule effect.

Coefficients

In simple or multiple linear regression, the size of the coefficient for each independent variable gives you the size of the effect that variable is having on your dependent variable, and the sign on the coefficient (positive or negative) gives you the direction of the effect. In regression with a single independent variable, the coefficient tells you how much the dependent variable is expected to increase (if the coefficient is positive) or decrease (if the coefficient is negative) when that independent variable increases by one. In regression with multiple independent variables, the coefficient tells you how much the dependent variable is expected to increase when that independent variable increases by one, holding all the other independent variables constant. Remember to keep in mind the units which your variables are measured in.

Note: in forms of regression other than linear regression, such as logistic or probit, the coefficients do not have this straightforward interpretation. Explaining how to deal with these is beyond the scope of an introductory guide.

R-Squared and overall significance of the regression

The R-squared of the regression is the fraction of the variation in your dependent variable that is accounted for (or predicted by) your independent variables. (In regression with a single independent variable, it is the same as the square of the correlation between your dependent and independent variable.) The R-squared is generally of secondary importance, unless your main concern is using the regression equation to make accurate predictions. The P value tells you how confident you can be that each individual variable has some correlation with the dependent variable, which is the important thing.

Another number to be aware of is the P value for the regression as a whole. Because your independent variables may be correlated, a condition known as multicollinearity, the coefficients on individual variables may be insignificant when the regression as a whole is significant. Intuitively, this is because highly correlated independent variables are explaining the same part of the variation in the dependent variable, so their explanatory power and the significance of their coefficients is "divided up" between them.

Correlation and Causation

What are correlation and causation and how are they different?

Two or more variables considered to be related, in a statistical context, if their values change so that as the value of one variable increases or decreases so does the value of the other variable (although it may be in the opposite direction).

For example, for the two variables "hours worked" and "income earned" there is a relationship between the two if the increase in hours worked is associated with an increase in income earned. If we consider the two variables "price" and "purchasing power", as the price of goods increases a person's ability to buy these goods decreases (assuming a constant income).

Correlation is a statistical measure (expressed as a number) that describes the size and direction of a relationship between two or more variables. A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the change in the values of the other variable.

Causation indicates that one event is the result of the occurrence of the other event; i.e. there is a causal relationship between the two events. This is also referred to as cause and effect.

Theoretically, the difference between the two types of relationships are easy to identify — an action or occurrence can cause another (e.g. smoking causes an increase in the risk of developing lung cancer), or it can correlate with another (e.g. smoking is correlated with alcoholism, but it does not cause alcoholism). In practice, however, it remains difficult to clearly establish cause and effect, compared with establishing correlation.

Why are correlation and causation important?

The objective of much research or scientific analysis is to identify the extent to which one variable relates to another variable. For example:

Is there a relationship between a person's education level and their health?
Is pet ownership associated with living longer?
Did a company's marketing campaign increase their product sales?

These and other questions are exploring whether a correlation exists between the two variables, and if there is a correlation then this may guide further research into investigating whether one action causes the other. By understanding correlation and causality, it allows for policies and programs that aim to bring about a desired outcome to be better targeted.

How is correlation measured?

For two variables, a statistical correlation is measured by the use of a Correlation Coefficient, represented by the symbol (r), which is a single number that describes the degree of relationship between two variables.

The coefficient's numerical value ranges from +1.0 to –1.0, which provides an indication of the strength and direction of the relationship.
If the correlation coefficient has a negative value (below 0) it indicates a negative relationship between the variables. This means that the variables move in opposite directions (ie when one increases the other decreases, or when one decreases the other increases).

If the correlation coefficient has a positive value (above 0) it indicates a positive relationship between the variables meaning that both variables move in tandem, i.e. as one variable decreases the other also decreases, or when one variable increases the other also increases.

Where the correlation coefficient is 0 this indicates there is no relationship between the variables (one variable can remain constant while the other increases or decreases).

While the correlation coefficient is a useful measure, it has its limitations:Correlation coefficients are usually associated with measuring a linear relationship.
For example, if you compare hours worked and income earned for a tradesperson who charges an hourly rate for their work, there is a linear (or straight line) relationship since with each additional hour worked the income will increase by a consistent amount.

If, however, the tradesperson charges based on an initial call out fee and an hourly fee which progressively decreases the longer the job goes for, the relationship between hours worked and income would be non-linear, where the correlation coefficient may be closer to 0.

Care is needed when interpreting the value of 'r'. It is possible to find correlations between many variables, however the relationships can be due to other factors and have nothing to do with the two variables being considered.
For example, sales of ice creams and the sales of sunscreen can increase and decrease across a year in a systematic manner, but it would be a relationship that would be due to the effects of the season (ie hotter weather sees an increase in people wearing sunscreen as well as eating ice cream) rather than due to any direct relationship between sales of sunscreen and ice cream.

The correlation coefficient should not be used to say anything about cause and effect relationship. By examining the value of 'r', we may conclude that two variables are related, but that 'r' value does not tell us if one variable was the cause of the change in the other.
How can causation be established?

Causality is the area of statistics that is commonly misunderstood and misused by people in the mistaken belief that because the data shows a correlation that there is necessarily an underlying causal relationship .

The use of a controlled study is the most effective way of establishing causality between variables. In a controlled study, the sample or population is split in two, with both groups being comparable in almost every way. The two groups then receive different treatments, and the outcomes of each group are assessed.

For example, in medical research, one group may receive a placebo while the other group is given a new type of medication. If the two groups have noticeably different outcomes, the different experiences may have caused the different outcomes.

Due to ethical reasons, there are limits to the use of controlled studies; it would not be appropriate to use two comparable groups and have one of them undergo a harmful activity while the other does not. To overcome this situation, observational studies are often used to investigate correlation and causation for the population of interest. The studies can look at the groups' behaviours and outcomes and observe any changes over time.

The objective of these studies is to provide statistical information to add to the other sources of information that would be required for the process of establishing whether or not causality exists between two variables.

Developing a Hedonic Regression Model For Camcorders In the U.S. CPI

Background

The Bureau of Labor Statistics (BLS) has been conducting research into extending the use of hedonic regression models for quality adjustment purposes to additional items within the Consumer Price Index (CPI). Hedonic models estimate values for individual characteristics bundled together to form a good or service(2). This allows the CPI to calculate the value of quality change between two items. The CPI commodity analyst will use the parameter estimates obtained from the hedonic model to adjust the price change used in index calculations in instances where the new item and old item differ in quality. Consumer electronics manufacturers are constantly improving the quality of consumer electronic items in an effort to remain competitive. Quality change often occurs at the time manufacturers introduce new models to replace previous year's models. Video cameras (or camcorders) are one of the items chosen for hedonics research due to recent advances in camcorder technology.

Camcorders are included in the Other Video Equipment CPI item stratum (RA03) along with video cassette recorders (VCRs), DVD players, satellite video products and other miscellaneous video products. Camcorders have an estimated 31 percent of the weight within Other Video Equipment. During the time period from December 1997 to December 1999 the Other Video Equipment index decreased 26.5 percent. The average monthly decline was 1.1 percent. (The Other Video Equipment index was redefined for the January 1998 CPI revision so a longer term comparison is not available. Prior to the revision, camcorders were included in the Video Products Other than Televisions index. This index decreased 38.7 percent from December 1988 to December 1997 — an average monthly decline of 0.4 percent.)

Data and Regression Model

Part of the CPI hedonic initiative called for collecting additional data by CPI field economists. The CPI camcorder sample size was deemed insufficient for regression modeling purposes. Based on current CPI sampling procedures, CPI statisticians designed a supplemental sample for hedonic modeling purposes only. The new sample added 190 outlets with 2 observations assigned in each outlet. The final sample included 350 specially collected observations and 130 observations from the CPI sample. The field economists were unable to collect data for 8 percent of the supplemental sample.

Formulating a hedonic regression model for high tech goods such as camcorders is not a simple task. Determining which camcorder characteristics contribute to the price of a camcorder is difficult due to rapid technical improvements. Manufacturers and retailers of high tech goods further complicate the situation by using different names for the same feature, and retailers have limited information available at the camcorder displays. Consumers are left at the mercy of the retailers' sales personnel for technical information. For the purpose of formulating the camcorder hedonic regression model, data was taken from various sources. CPI field economists collected the primary data. The manufacturer model numbers obtained by the data collectors were matched with specifications provided by the manufacturer internet sites. (Those model numbers that did not match any manufacturer model numbers were dropped from the data set.) Further research helped to develop an a priori model — comparing retailer advertisements, determining which attributes are consistently reported by manufacturers, reviewing consumer magazines and websites, and reading the feedback provided by the data collectors.

Consumers researching camcorders for a purchase are typically advised to first select a format. There are currently five analog camcorder formats available: full-size VHS, 8 millimeter (8mm), Hi-8, VHS-C (compact VHS), and super VHS-C (S-VHS-C); and two digital formats available: mini digital (miniDV), and digital 8 millimeter (digital 8mm).

The digital formats are currently the most technologically advanced. Their picture and audio quality is superior to analog camcorders, but digital camcorders convert their signal to analog for playback on non-digital television sets. Digital computer editing and digital still capability are other technological advances popular with consumers. The miniDV is the smaller of the two formats. Its tapes (although relatively expensive) are the smallest and have immense storage capacity. The digital 8mm is Sony's latest format and enabled Sony to reach consumers shopping the low-end digital market(3). The digital 8mm tapes are compatible with the analog Hi-8 and 8mm formats.

Full-size VHS camcorders have been available the longest. This is the bulkiest camcorder format and usually has to be held on your shoulder to maintain a steady picture. The appeal to this camcorder is that it uses the standard VHS VCR tape for recording; it does not require any adapters for playback in your VCR.

VHS-C produces video comparable to the full-size VHS but is a much smaller camcorder. In order to make the camcorder smaller the manufacturers had to reduce the size of the tape. The tape is about the size of two packs of playing cards(4) and must be inserted into an adapter for playback in your VCR. These smaller tapes record less than half as long as the full-size VHS camcorder. Super VHS-C is similar to the VHS-C but produces a higher quality video.

The 8mm camcorder is about the same size as the VHS-C camcorders but can not be played back on a VHS VCR. Instead it is connected to the television set for playback. Its tape has a longer recording time than the VHS-C camcorder and is generally of higher quality than the VHS-C camcorders. The Hi-8 camcorder has a better picture and audio quality than the standard 8mm.

The data set used for estimating the regression model included 453 observations. Several observations in the data set were deleted due to incomplete data and lack of information from secondary sources. Secondary sources were used to verify the majority of the data. Other observations were deleted because the outlet eligibility for collection in the CPI was questionable. Several camcorders were collected at rental outlets. At these outlets a consumer pays a fee each month to rent an item. If they maintain these payments over a number of months or years the item is theirs to keep, although greatly overpriced. Since we could not be sure of the accuracy of the reported prices due to the nature of this type of outlet, these observations were deleted.

The natural log of the collected price was specified as the dependent variable. The "transaction" price was preferable to the regular price since it represented the price that a consumer would more likely pay. Sale prices were reported for 32 percent of the data set. The mean collected price for observations with a sale price was not much lower than the mean collected price for observations with a regular price (mean for sale priced observations was $674.33 compared to $676.48 for regular priced observations). However, comparing the mean sale price versus the mean regular price across the various formats found that the sale prices for almost all of the formats were lower than the regular prices (full-size VHS was the only exception). Since sale price is a price factor a dummy variable for sale price was included as an explanatory variable in the model.

Specifying the model with the variables mentioned thus far yielded the following results:

Variable Name	Parameter Estimate	Standard Error	T for H0: Parameter=0	Tolerance
Intercept	6.158586	0.01823837	337.672
Sale price	-0.063380	0.02151685	-2.946	0.96758929
Camcorder Format:
VHS	-0.076831	0.05650739	-1.360	0.93656894
8mm	-0.022241	0.02731889	-0.814	0.76197381
VHS-C
S-VHS-C	0.242112	0.07634319	3.171	0.94697938
HI-8	0.395480	0.03118752	12.681	0.80072111
Digital 8mm	0.692510	0.02949420	23.480	0.78800195
MiniDV	1.034472	0.03482691	29.703	0.83312266
R2 = 0.7645; Adjusted R2 = 0.7608; F statistic = 206.86; Number of observations = 453

Further examination of the data found that Sony and Canon camcorders had the highest mean prices. Sony comprised 45 percent of the data set and Canon less than 1 percent. Since Sony has a strong reputation among consumers as a high quality brand and accounts for the majority of brands within the data set, Sony was included in the model. Unfortunately this caused problems with the specification of the model. Sony is currently the only maker of digital 8mm camcorders; therefore, the Sony variable is highly correlated with the digital 8mm variable. Thus multicollinearity will be present in the model if both variables are included. Since both these variables are strong price determinants, not including either variable would result in a specification bias; however, including both variables would result in biased parameter estimates. To remedy the situation, combination variables were created for Sony and camcorder type:

Sony digital 8mm
Non Sony miniDV
Sony miniDV
Non Sony 8mm
Sony 8mm
Non Sony Hi8
Sony Hi8

Sony does not make camcorders in the other three formats (full size VHS, S-VHS-C and VHS-C) so it was not necessary to combine these variables with Sony. The remaining brands represented in the data set (Canon, Hitachi, JVC, Panasonic, Proscan, Quasar, RCA, Samsung, and Sharp) were not included in the model. These brands are highly correlated with other variables in the data set and lack of a priori information regarding their usefulness in explaining price prevented their inclusion.

Based on a priori expectations several other variables were included in the model: monitor size in inches, color viewfinder, image stability, and weight without battery is less than or equal to one pound. Most camcorders currently come with both a viewfinder and a monitor. In this data set 81 percent had a monitor. The monitor is much larger than the viewfinder and also functions as a way to playback the video. Some monitors are built into the side of the camcorder and some swivel out for viewing. Monitor size is the only continuous variable included in the model. The monitor size is measured along the diagonal and the values in this data set range from zero (or no monitor included) to 4 inches. Manufacturers and retail outlets heavily advertise image stability as a price factor. This feature compensates for movement in hand held camcorders so the video playback is not shaky. There are two different types of image stabilization, optical and electronic, but reliable data could not be found for the models in the data set to test if image stabilization type is a price factor. Weight without battery is less than one pound was thought to be price determining since manufacturers heavily advertise this fact. Several manufacturers advertised that their lightweight camcorder was the lightest model currently available to the consumer. The final non-control variable in the model is for Joint Photographers Experts Group or JPEG capability. Images are saved in the universal JPEG format, which allows you to view your images on virtually any computer running Windowsâ or Mac OSâ(5). This variable was included to account for an outlier observation. In this data set there was only one observation that had this feature.

Several variables were included in the model that control for type of business and area of the country where the data are collected. Most of these variables behaved as expected. Discount department stores and warehouse outlets usually have the lowest prices and thus a negative parameter estimate. The positive parameter estimate for furniture/appliance outlets is not unexpected since these are typically the smaller, more specialized and usually local outlets that are known for their customer service. Catalog outlets selling electronic items usually sell mostly high-end electronic items and also charge a steep shipping fee. This is accounted for in the large parameter estimate for catalog outlets.

Variable Name	Parameter Estimate	Standard Error	T for H0: Parameter=0	Tolerance
Intercept	5.850072	0.02210676	264.628	.
Sale price	-0.077808	0.01273242	-6.111	0.88223450
Camcorder Format:
Non-Sony 8MM	-0.166143	0.02551081	-6.513	0.51232970
VHS	-0.087593	0.03552068	-2.466	0.75304756
VHS-C
Non-Sony Hi 8MM	0.084923	0.04161441	2.041	0.90206421
Sony 8MM	0.088555	0.02412902	3.670	0.52296749
S-VHS-C	0.090691	0.04396110	2.063	0.90732597
Sony Hi 8 MM	0.379994	0.01894278	20.060	0.78249993
Sony Digital 8	0.586502	0.01753512	33.447	0.70856195
Non-Sony Digital	0.672284	0.02981434	22.549	0.65632066
Sony Digital	0.880286	0.03190287	27.593	0.62014672
Monitor and Viewfinder:
Monitor Size	0.121159	0.00539411	22.461	0.68787524
Black and White Viewfinder
Color Viewfinder	0.093201	0.01477815	6.307	0.60799750
Other Features:
Image Stability	0.071747	0.01882168	3.812	0.45864474
Weight w/o battery < = 1 lb	0.290191	0.04067331	7.135	0.71303600
JPEG file format capable	0.504677	0.12048018	4.189	0.95143846
Control Variables:
Discount Department Store	-0.081571	0.01776653	-4.591	0.80489622
Midwest Region	-0.041152	0.01368860	-3.006	0.89410748
Warehouse Store	-0.038919	0.02664068	-1.461	0.63459473
Appliance Store	0.050699	0.02090357	2.425	0.86476762
Furniture/Appliance Store	0.281176	0.03281548	8.568	0.88232259
Mail Order Catalog	0.369475	0.12367893	2.987	0.90286013
R2 = 0.9276; Adjusted R2 = 0.9241; F Statistic = 263.044; Number of Observations = 453

The parameter estimates in the model above conform with a priori expectations and the R-squared value indicates that almost 93 percent of the variation in the dependent variable is explained by the independent variables. This is quite high for a hedonic regression model calculated using CPI data.

The final model was examined for multicollinearity using the tolerance statistic and pairwise correlations. Two potential problems were found: a correlation of -0.56 between image stability and Sony 8mm; and a correlation of 0.47 between Non-Sony 8mm and warehouse outlet. However, all these variables were still included in the final model. High correlations did preclude the inclusion of another price factor. Stereo audio is the highest quality audio available for camcorders but stereo audio is highly correlated (0.66) with Sony digital 8mm. Including stereo audio in the model caused the magnitude of the parameter estimate for Sony digital 8mm to fall and its standard error to increase.

A much simpler experimental model was specified with number of pixels as the only explanatory variable. Number of pixels is a way to determine the picture resolution. This information was not specifically requested on the checklist but was sometimes reported by the field economists. Secondary source data was used as a supplement. The "pixel" model did yield good results. The parameter estimate for number of pixels was positive and the R-squared value was large. Number of pixels could not be included in the final camcorder regression model since it is highly correlated with several variables. The Circuit City web site warns against "pixel counting". They mention that the number of pixels reported by manufacturers is actually the potential number of pixels and that most camcorders have a similar picture resolution regardless of the number of pixels.

Variable Name	Parameter Estimate	Standard Error	T for H0: Parameter=0	Tolerance
Intercept	5.470421	0.03157592	173.247
Number of pixels	0.002621	0.00008169	32.088	1.0000000
R2 = 0.7001; Adjusted R2 = 0.6994; F statistic = 1029.614; Number of observations = 443

The final camcorder regression model proposed above may appear to be much simpler than one might expect for a complicated high tech good. The main price determinants are the format of the camcorder along with Sony versus other brands. Those features that were included in the final model appear to be the non-gimmick and non-technical features. Several of the features that manufacturers and retailers tout as the "must haves" were not found to be price determinants. For example, based on the results of preliminary regression models, a larger digital zoom ratio did not have any impact on price despite manufacturers' attempts to persuade consumers of its importance. Also, "technical" specifications were not found to be price contributors. For example, the number of pickup devices(6) and lux(7) did not have an impact on price. Most camcorders available to the consumer (opposed to professional camcorders) have the same number of pickup devices. As with number of pixels mentioned above, Circuit City warns consumers against using manufacturers' lux rating as a guide for purchasing a camcorder. They state that "not all manufacturers use the same scale nor do they use the same testing methods" when determining the lux rating. Based on that statement it is easy to see why consumers are confused over the more technical features.

Index Results

In order to determine the impact of using the camcorder hedonic model in the CPI, an experimental Other Video Equipment index was calculated for the six month time period between June and November 1999. The parameter estimates obtained from the model were applied to camcorder substitute items (an item chosen by CPI data collectors to replace the previously collected item when it is no longer available) with quality changes. During the time period examined there were 58 camcorder substitutions. In the published index, 38 percent had the price of the substitute item directly compared with the price of previous item. The price change for the remainder of the substitutions was imputed via the class-mean imputation method(8). For the purpose of calculating the experimental index, the camcorder substitutions were reassessed. Sixty-seven percent of the substitutions were determined to have changes in quality that could be adjusted using the hedonic model. Out of the remaining substitutions, 28 percent of the prices were directly compared and 5 percent of the price changes were imputed via the class-mean imputation method. The substitution comparability ratio (the ratio of directly compared and quality adjusted substitute quotes to the total number of substitute quotes) improved from 38 percent to 95 percent. Most of the quality adjustments were to adjust for changes in monitor size or changes in type and brand. The table below summarizes the specification changes that occurred with the substitutions.

Specification Change	Number of Occurrences
Unknown
Model number change only
Same item (was actually not a substitution)
Monitor size*
Camcorder brand and/or type*
Color viewfinder*
Image stability*
Low weight*
* Note: more than one of these specifications could have changed for a substitution.

From May 1999 to November 1999 the index using the direct hedonic quality adjustments decreased 6.9 percent compared to a 6.7 percent decrease for the published index, a difference of 0.2 percent(9). Chart 1 compares the published versus the quality adjusted index and chart 2 compares the one month changes of the published versus quality adjusted index.

The months that had the biggest differences between the two indexes were August and November. In August the quality adjusted one month index decreased 0.4 percent more than the published index and in November the quality adjusted one month index rose 0.5 percent more. Comparing the (unweighted) mean price changes for the camcorder substitutions in the published index versus the quality adjusted index shows that the quality adjustments caused the mean price change to be lower in the quality adjusted index. The mean price change for substitutions where the price of the old item was directly compared to the price of the new item increased in the experimental quality adjusted index. These substitutions were mainly those where only the model number changed. The mean price change for substitutions with quality changes fell 9.2 percent, bringing the overall substitution mean price change down more than in the published index. The table below compares the mean price changes of the camcorder substitutions.

Camcorder Substitutions from June 1999 to November 1999
	Published Index		Hedonics Index
	Number	Mean Price Change	Number	Mean Price Change
All substitutions		-3.60 %		-4.56 %
Directly compared substitutions		-1.07 %		3.35 %
Quality adjusted substitutions				-9.21 %
Class-mean imputed (noncomparable) substitutions		-5.14 %		13.66 %

The impact of applying the quality adjustments to the Other Video Equipment index was minimal since camcorders are only a part of the index and camcorder substitutions an even smaller part. These results are similar to studies done with the Televisions index and Video Products Other than Televisions index. The table below summarizes previous CPI hedonics studies.

Item	Difference in 12-Month Index
Personal-computers(10)	6.5% Lower
Televisions(11)	0.1% Lower
Video Products Other than Televisions (VCRs only)(12)	0.1% Higher
Audio Equipment(13)	1.4% Higher

Potentially, using hedonic models for camcorders, DVD players, and VCRs (the three most heavily weighted items in the Other Video Equipment index) should yield a larger impact. The application of the camcorder hedonic model does noticeably decrease the number of noncomparable (imputed) substitutions. This alone makes hedonics a worthwhile endeavor.

Notes

(1) The author wishes to thank Charles Fortuna, Paul Liegey, Bill Thompson, Lynn Reese, John Greenlees, and Mary Kokoski for helpful suggestions and Frank Joseph for his assistance in preparing the camcorder data.

(2) See Dennis Fixler, Charles Fortuna, John Greenlees, and Walter Lane, "The Use of Hedonic Regressions to Handle Quality Change: The Experience in the U.S. CPI," 1999, presented at the fifth meeting of the International Working Group on Price Indices

(3) Consumers Digest, November/December 1998, page 84

(4) Circuit City website, "Learn About Camcorders", http://www.circuitcity.com, visited September 30, 1999

(5) Sony website, http://www.sel.sony.com/SEL/consumer/ss5/office/camcorder/digitalvideoproducts/dcr-trv900_specs.shtml, visited August 12, 1999

(6) The part of the camcorder that translates the optical image from the lens into an electrical signal that can be recorded or viewed. Circuit City website, "Learn About Camcorders", http://www.circuitcity.com, visited September 30, 1999

(7) A rating for the amount of light needed to produce a recognizable image. Circuit City website, "Learn About Camcorders", http://www.circuitcity.com, visited September 30, 1999

(8) See Marshall B. Reinsdorf, Paul Liegey, and Kenneth J. Stewart, "New Ways of Handling Quality Change in the U.S. Consumer Price Index," BLS working paper no. 276 (Bureau of Labor Statistics. 1996).

(9) In addition to quality adjusting the camcorder substitutions, some of the imputed price changes for the class-mean substitutions (the noncomparable substitutions) in the Other Video Equipment index were recalculated since the inclusion of the quality adjustments changed the information used in calculating the imputations.

(10) Kenneth J. Stewart and Stephen B. Reed, "CPI Research Series using current methods, 1978-98", Monthly Labor Review, June 1999, pp. 29-38.

(11) Brent R. Moulton, Timothy J. LaFleur, and Karin E. Moses, "Research on Improved Quality Adjustment in the CPI: The Case of Televisions," Proceedings of the fourth meeting of the International Working Group on Price Indices (U.S. Department of Labor, sponsored by the Bureau of Labor Statistics, January 1999), pp. 77-99.

(12) Paul Liegey and Nicole Shepler, "Adjusting VCR prices for quality change: a study using hedonic methods," Monthly Labor Review, September 1999, pp. 22-37.

(13) Mary Kokoski, Keith Waehrer, and Patricia Rozaklis, "Using Hedonic Methods for Quality Adjustment in the CPI: The Consumer Audio Products Component," BLS draft paper

Search This Blog

Tuesday, November 12, 2013

Interpreting Regression Output

Interpreting Regression Output

Introduction

Brief review of regression

P, t and standard error

Coefficients

R-Squared and overall significance of the regression

Correlation and Causation

Correlation and Causation

Developing a Hedonic Regression Model For Camcorders In the U.S. CPI

Developing a Hedonic Regression Model For Camcorders In the U.S. CPI

Background

Data and Regression Model

Index Results

Notes