Common statistical tests: Association

Assess the association between two variables
Research question
Assumptions about the data?
Choice of statistical test
Analysis
Interpretation
Exercises

Assess the association between two variables

Now we are going to assess correlation between two variables. We will continue to use the MAQ, and introduce a second scale that is used to assess an individual’s beliefs about their medicines.

Beliefs About Medicines Questionnaire (BMQ)

The BMQ-Specific comprises of 10 items which assess how the participant perceives the necessity of their medication (5 items) and their concerns regarding their medication (5 items). Each item is assessed on a 5-point Likert scale, where 1 = “strongly disagree,” 2 = “disagree,” 3 = “uncertain,” 4 = “agree” and 5 = “strongly agree.”

Therefore scores for BMQ-Necessity and BMQ-Concerns range from 5–25, with higher scores meaning the participant more strongly perceives their medicine is necessary, or more strongly perceives concerns regarding their medicine.

Again, the R code and csv files for dataset is provided so that you can run these analyses (or others) yourself. Download the files here.

Research question

This time the research question is whether an individual’s beliefs about their medicines (as measured by the BMQ-Specific) is associated with their self-reported adherence (as measured by the MAQ).

Assumptions about the data?

Like we did for the first example, let’s start with the assumption that the MAQ data is discrete and that the mean MAQ is normally distributed.

We are going to make a similar assumption regarding the BMQ-Specific data. This could also be debated, but it is consistent with how these scales are used in the literature. Moreover, we tend to focus on the sum-scores of the two sub-scales: BMQ-necessity and BMQ-concerns. Given that we are adding the individual items within the sub-scale together, we are already assuming that these sub-scales provide interval data. We will also assume that mean scores for the two sub-scales are normally distributed.

Choice of statistical test

Given the assumptions we have made about the data we can assess the association between BMQ scores and MAQ scores using a Pearson correlation.

Analysis

example2.csv provides the data from the study. The dataset includes eight columns: MAQ_1, MAQ_2, MAQ_3, MAQ_4 provide answers to each MAQ item (“yes” is coded as 1); MAQ provides the MAQ score. BMQ_N, BMQ_C and BMQ_Diff provide the BMQ-necessity, BMQ-concerns, and BMQ-differential (i.e. BMQ-necessity - BMQ-concerns).

df <- read.csv(file = "csv/example2.csv", header = TRUE)  # reads the csv file into R for analysis

head(df)  # prints first lines of the dataframe

##   MAQ_1 MAQ_2 MAQ_3 MAQ_4 MAQ BMQ_N BMQ_C BMQ_Diff
## 1     0     0     0     1   1    20    13        7
## 2     1     1     0     0   2    19    12        7
## 3     0     0     1     0   1    17    13        4
## 4     1     0     0     0   1    23    12       11
## 5     0     0     0     1   1    18    16        2
## 6     0     0     0     0   0    23    15        8

summary(df)  # print a summary of the dataset

##      MAQ_1            MAQ_2            MAQ_3            MAQ_4       
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :1.0000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.5833   Mean   :0.2667   Mean   :0.1833   Mean   :0.1833  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##       MAQ            BMQ_N           BMQ_C          BMQ_Diff     
##  Min.   :0.000   Min.   : 5.00   Min.   : 5.00   Min.   :-5.000  
##  1st Qu.:1.000   1st Qu.:17.00   1st Qu.:10.00   1st Qu.: 3.000  
##  Median :1.000   Median :20.00   Median :12.00   Median : 7.000  
##  Mean   :1.217   Mean   :19.27   Mean   :12.68   Mean   : 6.592  
##  3rd Qu.:2.000   3rd Qu.:22.00   3rd Qu.:15.00   3rd Qu.:10.000  
##  Max.   :4.000   Max.   :25.00   Max.   :25.00   Max.   :19.000

Let’s plot the data. Scatterplots are good for observing possible associations between two variables.

p2 <- plot(df$BMQ_N, df$MAQ, pch = 19, ylab = "MAQ", xlab = "BMQ-necessity")

p3 <- plot(df$BMQ_C, df$MAQ, pch = 19, ylab = "MAQ", xlab = "BMQ-concerns")

p4 <- plot(df$BMQ_Diff, df$MAQ, pch = 19, ylab = "MAQ", xlab = "BMQ-differential")

p5 <- plot(df$BMQ_N, df$BMQ_C, pch = 19, ylab = "BMQ-concerns", xlab = "BMQ-necessity")

The first thing that is obvious in these plots is the effect of MAQ being discrete data with relatively few options. This makes it a little difficult to see any relationships.

If you squint a little you might see some relationships. Perhaps the clearest relationship is one we are not assessing: the relationship between BMQ-necessity and BMQ-concerns. There is an absence of people with low necessity scores and high concerns scores. It is worth noting, however, that there were relatively few participants with low necessity scores.

The two methods for assessing correlations in R are cor() and cor.test(). The first gives you the correlations (but doesn’t perform any tests), the second gives you the correlations and performs a test against the null hypothesis of no correlation. Read the documentation!—cor, cor.test.

df_maq <- df[, c("BMQ_N", "BMQ_C", "BMQ_Diff", "MAQ")]

cor(df_maq, method = "pearson")

##               BMQ_N      BMQ_C   BMQ_Diff        MAQ
## BMQ_N     1.0000000  0.1262681  0.6860675 -0.1237816
## BMQ_C     0.1262681  1.0000000 -0.6350863  0.3166274
## BMQ_Diff  0.6860675 -0.6350863  1.0000000 -0.3286026
## MAQ      -0.1237816  0.3166274 -0.3286026  1.0000000

cor.test(df$BMQ_N, df$MAQ, method = "pearson")

## 
##  Pearson's product-moment correlation
## 
## data:  df$BMQ_N and df$MAQ
## t = -1.355, df = 118, p-value = 0.178
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.29644579  0.05671811
## sample estimates:
##        cor 
## -0.1237816

cor.test(df$BMQ_C, df$MAQ, method = "pearson")

## 
##  Pearson's product-moment correlation
## 
## data:  df$BMQ_C and df$MAQ
## t = 3.626, df = 118, p-value = 0.0004262
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1456523 0.4692382
## sample estimates:
##       cor 
## 0.3166274

cor.test(df$BMQ_Diff, df$MAQ, method = "pearson")

## 
##  Pearson's product-moment correlation
## 
## data:  df$BMQ_Diff and df$MAQ
## t = -3.7794, df = 118, p-value = 0.000248
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.4795961 -0.1587091
## sample estimates:
##        cor 
## -0.3286026

Interpretation

According to this test:

There is a statistically significant correlation between:
1. BMQ-concerns and MAQ score, Pearsons correlation 0.317, p = 0.0004, and
2. BMQ-differential and MAQ score, Pearsons correlation -0.329, p = 0.0002
There was no statistically significant correlation between BMQ-necessity and MAQ

Increased concerns regarding medications seems to reduce medication adherence. The larger the BMQ-differential (BMQ-necessity - BMQ-concerns) the better an individual’s adherence as measured by the MAQ score.

While the correlation is statistically significant, it is not a particularly strong correlation. Correlation varies from -1 to 1, with -1 demonstrating a perfect linear inverse correlation (as \(x\) increases, \(y\) decreases), 0 meaning the variables are independent and 1 demonstrating a perfect linear correlation (as \(x\) increases, \(y\) increases).

Exercises

Re-run the analysis, this time assuming the data is ordinal. What are the appropriate summary measures of the data? Which statistical test is appropriate? What result do you get if you run the test?
Develop your data manipulation skills in R. Divide the participants in example2.csv in to those with low and high BMQ Necessity beliefs. (i) Assign participants with an BMQ_N score below the median, “LN,” to represent a low BMQ_N score (less than the median) and, “HN,” to represent a high BMQ_N score (greater than or equal to the median); (ii) identify participants who are identified by the MAQ as intentionally nonadherent (i.e. participants who answered “yes” to MAQ_Q3 or MAQ_Q4), and (iii) present a 2 x 2 table of participants grouped by low/high BMQ_N score and yes/no “intentional nonadherence.” Suggested tools: as.factor(), ifelse(), table().

Last updated on Apr 30, 2021