Introduction to SSM AnalysisĀ¶
Reproduced from the introductory vignette for R Circumplex, by Girard J, Zimmermann J, Wright A (2023). circumplex: Analysis and Visualization of Circular Data. https://github.com/jmgirard/circumplex, http://circumplex.jmgirard.com/.
If you find this tutorial useful, please cite Girard, Zimmermann, & Wright (2023). I am reproducing it here to demonstrate the equivalence between the R and Python versions of the package. The original R version of this vignette can be found here.
import circumplex
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
degree_sign = u'\N{DEGREE SIGN}'
1. Background and MotivationĀ¶
Circumplex models, scales, and dataĀ¶
Circumplex models are popular within many areas of psychology because they offer a parsimonious account of complex psychological domains, such as emotion and interpersonal functioning. This parsimony is achieved by understanding phenomena in a domain as being a "blend" of two primary dimensions. For instance, circumplex models of emotion typically represent affective phenomena as a blend of valence (pleasantness versus unpleasantness) and arousal (activity versus passivity), whereas circumplex models of interpersonal functioning typically represent interpersonal phenomena as a blend of communion (affiliation versus separation) and agency (dominance versus submissiveness). These models are often depicted as circles around the intersection of the two dimensions (see figure). Any given phenomenon can be located within this circular space through reference to the two underlying dimensions (e.g. anger is a blend of unpleasantness and activity).
Circumplex scales contain multiple subscales that attempt to measure different blends of the two primary dimensions (i.e., different parts of the circle). Although there have historically been circumplex scales with as many as sixteen subscales, it has become most common to use eight subscales: one for each āpoleā of the two primary dimensions and one for each āquadrantā that combines the two dimensions. In order for a set of subscales to be considered circumplex, they must exhibit certain properties. Circumplex fit analyses can be used to quantify these properties.
Circumplex data is composed of scores on a set of circumplex scales for one or more participants (e.g., persons or organizations). Such data is usually collected via self-report, informant-report, or observational ratings in order to locate psychological phenomena within the circular space of the circumplex model. For example, a therapist might want to understand the interpersonal problems encountered by an individual patient, a social psychologist might want to understand the emotional experiences of a group of participants during an experiment, and a personality psychologist might want to understand what kind of interpersonal behaviors are associated with a trait (e.g., extraversion).
angles = (90, 135, 180, 225, 270, 315, 360, 45)
alabel = ("PA", "BC", "DE", "FG", "HI", "JK", "LM", "NO")
# Create plot ---------------------------------------------------------------
fig, ax = plt.subplots(figsize=(4, 4), subplot_kw=dict(polar=True))
ax.plot()
ax.set_xticks(np.radians(angles), labels=alabel, fontsize=14)
ax.set_yticks([])
ax.grid(True)
for i, angle in enumerate(angles):
ax.text(
np.radians(angle),
0.6,
f"{angle}{degree_sign}",
ha="center",
va="center",
fontsize=12,
)
plt.show()
The Structural Summary MethodĀ¶
The Structural Summary Method (SSM) is a technique for analyzing circumplex data that offers practical and interpretive benefits over alternative techniques. It consists of fitting a cosine curve to the data, which captures the pattern of correlations among scores associated with a circumplex scale (i.e., mean scores on circumplex scales or correlations between circumplex scales and an external measure). By plotting a set of example scores below, we can gain a visual intuition that a cosine curve makes sense in this case. First, we can examine the scores with a bar chart ignoring the circular relationship among them.
from circumplex.datasets import JZ2017
import matplotlib.pyplot as plt
jz_data = JZ2017
r = jz_data.ssm_analyse(measures = ["NARPD"])
plt.figure(figsize=(8, 5))
plt.bar(r.results[0].scores.index, r.results[0].scores.values, color='red')
plt.ylim(0, 0.5)
plt.ylabel("Score")
plt.xlabel("Scale")
plt.title("NARPD Scores")
plt.grid(True)
plt.show()
Next, we can leverage the fact that these subscales have specific angular displacements in the circumplex model (and that 0 and 360 degrees are the same) to create a path diagram.
fig, ax = circumplex.profile_plot(r.results[0].amplitude, r.results[0].displacement, r.results[0].elevation, r.results[0].r2, r.results[0].angles, r.results[0].scores, r.results[0].label, incl_amp=False, incl_disp=False, incl_pred=False, incl_fit=False, reorder_scales=True);
ax.grid(True)
plt.ylim(0, 0.5)
plt.xlabel("Angle")
plt.title("Scores by Angle")
plt.show()
This already looks like a cosine curve, and we can finally use the SSM to estimate the parameters of the curve that best fits the observed data. By plotting it alongside the data, we can get a sense of how well the model fits our example data.
fig, ax = r.results[0].profile_plot(reorder_scales=True, incl_amp=False, incl_disp=False, incl_pred=True, incl_fit=False);
ax.grid(True)
plt.ylim(0, 0.5)
plt.xlabel("Angle")
plt.title("Cosine curve estimated by SSM")
plt.show()
Understanding the SSM ParametersĀ¶
The SSM estimates a cosine curve to the data using the following equation:
$$ S_i = e + a \times \cos(\theta_i - d) $$
where $S_i$ and $\theta_i$ are the score and angle on scale $i$, respectively, and $e$, $a$, and $d$ are the elevation, amplitude, and displacement of the curve, respectively. Before we discuss these parameters, however, we can also estimate the fit of the SSM model. This is essentially how close the cosine curve is to the observed data points.
from matplotlib import collections as mc
fig, ax = r.results[0].profile_plot(reorder_scales=True, incl_amp=False, incl_disp=False, incl_pred=True, incl_fit=False, c_fit='black', c_scores='black');
thetas = np.linspace(0, 360, 1000)
fit = circumplex.circumplex.cosine_form(thetas, r.results[0].amplitude, r.results[0].displacement, r.results[0].elevation)
angles, scores = circumplex.circumplex.sort_angles(r.results[0].angles, r.results[0].scores)
lines = []
for i, angle in enumerate(angles):
idx = np.where(np.isclose(thetas, angle, atol=0.2))[0][-1]
lines.append([(angle, fit[idx]), (angle, scores[i])])
if angle == 360:
lines.append([(0, fit[0]), (0, scores[i])])
lc = mc.LineCollection(lines, colors='red', linewidths=10)
ax.add_collection(lc)
ax.grid(True)
plt.ylim(0, 0.5)
plt.xlabel("Angle")
plt.title(f"Fit = {round(r.results[0].r2, 2)}")
plt.show()
If fit is less than 0.70, it is considered "unacceptable" and only the elevation parameter should be interpreted. If fit is between 0.70 and 0.80, it is considered "adequate", and if it is greater than 0.80, it is considered "good". Sometimes SSM model fit is called prototypicality or denoted using $R^2$.
The first SSM parameter is elevation or $e$, which is calculated as the mean of all scores. It is the size of the general factor in the circumplex model and its interpretation varies from scale to scale. For measures of interpersonal problems, it is interpreted as generalized interpersonal distress. When using correlation-based SSM, $|e| \geq 0.15$ is considered "marked" and $|e| \leq .15$ is considered "modest".
The second SSM is amplitude or $a$, which is calculated as the difference between the highest point of the curve and the curve's mean. It is interpreted as the distinctiveness or differentiation of a profile: how much it is peaked versus flat. Similar to elevation, when using correlation based SSM, $a \geq 0.15$ is considered "marked" and $a \leq 0.15$ is considered "modest".
The final SSM parameter is displacement or $d$, which is calculated as the angle at which the curve reaches its highest point. It is interpreted as the style of the profile. For instance, if $d = 90$ and we are using a circumplex scale that defines 90 degrees as "domineering", then the profile's style is domineering.
By interpreting these three parameters, we can understand a profile much more parsimoniously than by trying to interpret all eight subscales individually. This approach also leverages the circumplex relationship (i.e. dependency) among subscales. It is also possible to transform the amplitude and displacement parameters into estimates of distance from the x-axis and y-axis, which will be shown in the output discussed below.
Example Data: jz2017Ā¶
To illustrate the SSM functions, we will use the example dataset JZ2017
, which was provided by Zimmerman & Wright (2017). This dataset includes self-report data from 1166 undergraduate students. Students completed a circumplex measure of interpersonal problems with eight subscales (PA, BC, DE, FG, HI, JK, LM, and NO) and a measure of personality disorder symptoms with ten subscales (PARPD, SCZPD, SZTPD, ASPD, BORPD, HISPD, NARPD, AVPD, DPNPD, and OCPD). More information about these variables can be accessed by looking at the summary of the dataset with jz_data.summary()
:
from circumplex.datasets import _jz2017_path
jz2017 = circumplex.instrument.load_instrument('IIPSC')
jz2017.attach_data(pd.read_csv(_jz2017_path))
IIP-SC: Inventory of Interpersonal Problems Short Circumplex 32 Items, 8 Scales, 2 normative data sets Soldz, Budman, Demby, & Merry (1995) <https://doi.org/10.1177/1073191195002001006>
And we can view the accompanying dataset with:
jz2017.data.head()
Gender | PA | BC | DE | FG | HI | JK | LM | NO | PARPD | SCZPD | SZTPD | ASPD | BORPD | HISPD | NARPD | AVPD | DPNPD | OCPD | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Female | 1.50 | 1.50 | 1.25 | 1.00 | 2.00 | 2.50 | 2.25 | 2.50 | 4 | 3 | 7 | 7 | 8 | 4 | 6 | 3 | 4 | 6 |
1 | Female | 0.00 | 0.25 | 0.00 | 0.25 | 1.25 | 1.75 | 2.25 | 2.25 | 1 | 0 | 2 | 0 | 1 | 2 | 3 | 0 | 1 | 0 |
2 | Female | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 1 | 0 | 4 | 1 | 5 | 4 | 0 | 0 | 1 |
3 | Male | 2.00 | 1.75 | 1.75 | 2.50 | 2.00 | 1.75 | 2.00 | 2.50 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
4 | Female | 0.25 | 0.50 | 0.25 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
The circumplex scales in JZ2017
come from the Inventory of Interpersonal Problems - Short Circumplex (IIP-SC). These scales can be arranged into the following circular model, which is organized around the two primary dimensions of agency (y-axis) and communion (x-axis). Note that the two-letter scale abbreviations and angular values are based on convention. A high shore on PA indicates that one has interpersonal problems related to being "domineering" or too high on agency, whereas a high score on DE indicates problems related to being "cold" or too low on communion. Scales that are not directly on the y-axis or x-axis (i.e. BC, FG, JK, and NO) represent blends of agency and communion.
jz2017.demo_plot()
Mean-based SSM AnalysisĀ¶
Conducting SSM for a group's mean scoresĀ¶
To begin, let's say that we want to use the SSM to describe the interpersonal problems of the average individual in the entire dataset. Although it is possible to analyze the raw scores contained in jz2017
, our results will be more interpretable if we standardize the scores first. We can do this using the standardize()
function.
The first argument to this function is data
, a dataframe containing the circumplex scales to be standardized. The second argment is scales
and specifies where in data
the circumplex scales are (either in terms of their variable names or their column numbers). The third argument is angles
and specifies the angle of each of the circumplex scales included in scales
. Note that the scales
argument should be a circumplex.Scales
object and angles
should be a np.array
that have the same ordering and length. Finally, the fourth argument is norms
, a dataframe containing the normative data we will use to standardize the circumplex scales. Here, we will use normative data for the IIP-SC by loading the iipsc
instrument.
df = circumplex.utils.standardize(
data=jz2017.data,
scales=jz2017.scales,
angles=np.array((90, 135, 180, 225, 270, 315, 360, 45)),
instrument=circumplex.instrument.load_instrument('IIPSC'),
sample=1,
)
jz2017s = jz2017.attach_data(df)
jz2017s.data
Gender | PA | BC | DE | FG | HI | JK | LM | NO | PARPD | ... | DPNPD | OCPD | PA_z | BC_z | DE_z | FG_z | HI_z | JK_z | LM_z | NO_z | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Female | 1.50 | 1.50 | 1.25 | 1.00 | 2.00 | 2.50 | 2.25 | 2.50 | 4 | ... | 4 | 6 | 1.121212 | 1.025362 | 0.409357 | -0.050132 | 0.633880 | 1.307918 | 0.951515 | 1.84375 |
1 | Female | 0.00 | 0.25 | 0.00 | 0.25 | 1.25 | 1.75 | 2.25 | 2.25 | 1 | ... | 1 | 0 | -1.151515 | -0.786232 | -1.052632 | -0.841689 | -0.185792 | 0.428152 | 0.951515 | 1.53125 |
2 | Female | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0 | ... | 0 | 1 | -1.151515 | -1.148551 | -1.052632 | -1.105541 | -1.551913 | -1.624633 | -1.775758 | -1.28125 |
3 | Male | 2.00 | 1.75 | 1.75 | 2.50 | 2.00 | 1.75 | 2.00 | 2.50 | 1 | ... | 0 | 0 | 1.878788 | 1.387681 | 0.994152 | 1.532982 | 0.633880 | 0.428152 | 0.648485 | 1.84375 |
4 | Female | 0.25 | 0.50 | 0.25 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0 | ... | 0 | 0 | -0.772727 | -0.423913 | -0.760234 | -1.105541 | -1.551913 | -1.624633 | -1.775758 | -1.28125 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1161 | Male | 0.00 | 1.00 | 1.00 | 2.50 | 2.50 | 2.50 | 1.75 | 1.00 | 3 | ... | 3 | 4 | -1.151515 | 0.300725 | 0.116959 | 1.532982 | 1.180328 | 1.307918 | 0.345455 | -0.03125 |
1162 | Female | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 2.25 | 0.00 | 1 | ... | 0 | 0 | -1.151515 | -1.148551 | -1.052632 | -1.105541 | -1.551913 | -1.624633 | 0.951515 | -1.28125 |
1163 | Male | 0.00 | 0.50 | 0.25 | 0.25 | 0.00 | 0.25 | 0.75 | 0.50 | 2 | ... | 0 | 1 | -1.151515 | -0.423913 | -0.760234 | -0.841689 | -1.551913 | -1.331378 | -0.866667 | -0.65625 |
1164 | Female | 0.50 | 0.25 | 0.00 | 0.25 | 0.25 | 0.25 | 0.25 | 0.50 | 3 | ... | 0 | 2 | -0.393939 | -0.786232 | -1.052632 | -0.841689 | -1.278689 | -1.331378 | -1.472727 | -0.65625 |
1165 | Female | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | 0.75 | 4 | ... | 1 | 5 | 0.363636 | -1.148551 | -1.052632 | -1.105541 | -1.551913 | -1.624633 | -0.563636 | -0.34375 |
1166 rows Ć 27 columns
Now we can use the ssm_analyze()
function to perform the SSM analysis. The first three arguments are the same as the first three arguments to standardize()
. We can pass the new jz2017s
object that contains standardized data as data
and the same vectors to scales
and angles
since these haven't changed.
results = circumplex.ssm_analyse(
data = jz2017s.data,
scales=['PA_z', 'BC_z', 'DE_z', 'FG_z', 'HI_z', 'JK_z', 'LM_z', 'NO_z'],
angles=(90, 135, 180, 225, 270, 315, 360, 45),
)
The output of the function has been saved in the results
object, and we can extract a table of the results using the table
property. This will output a table of the elevation, amplitude, displacement, and fit for the cosine curve estimated by the SSM. The table
property is a pandas.DataFrame
object.
results.table
label | group | measure | elevation | xval | yval | amplitude | displacement | r2 | PA_z | BC_z | DE_z | FG_z | HI_z | JK_z | LM_z | NO_z | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SSM | SSM | None | None | -0.225058 | 0.130532 | -0.014815 | 0.13137 | 353.524846 | 0.709645 | 90 | 135 | 180 | 225 | 270 | 315 | 360 | 45 |
That was pretty easy! We can now write up these results. However, the circumplex
package has some features that can make what we just did even easier.
results.plot()
(<Figure size 640x480 with 1 Axes>, <PolarAxes: >)
results.results[0].profile_plot()
(<Figure size 800x400 with 1 Axes>, <Axes: title={'center': 'SSM Profile'}, xlabel='Angle [deg]', ylabel='Score'>)