Toward Inclusive Research: The Effect of Response Options on Gender Categorization of Faces

Elli van Berlekom, Stefan Wiens, and Marie Gustafsson Sendén

Psychology Department, Stockholm University

Author Note

Elli van Berlekom Orcid ID Logo: A green circle with white letters ID http://orcid.org/0000-0002-1949-5600

Stefan Wiens Orcid ID Logo: A green circle with white letters ID http://orcid.org/0000-0003-4531-4313

Marie Gustafsson Sendén Orcid ID Logo: A green circle with white letters ID http://orcid.org/0000-0002-8393-5316

Correspondence concerning this article should be addressed to Elli van Berlekom, Psychology Department, Stockholm University, Email: elli.vanberlekom@psychology.su.se

Abstract

Gender is not a binary category, yet much of gender categorization research continues to treat it as such in terms of response options. This study comprises two experiments that challenge the binary gender norm by exploring alternative response options to measure gender categorization. In Experiment 1 (N = 66), we compared one-dimensional and two-dimensional scales for gender categorization of a diverse set of morphed faces. We found that regardless of the response options used, participants treated gender categorically, consistently using the ends of dimensional scales. In Experiment 2 (N = 105), we compared traditional binary response options with multiple categories and free-text answers. The results suggested that while non-binary options such as “non-binary” and “I don’t know” led to categorizations beyond the binary framework in about half of the participants, free-text options did not elicit similar results. Despite the opportunity to categorize faces beyond the binary, the predominant categorizations remained as ‘woman’ or ‘man’. We conclude that while inclusive response options can facilitate acknowledgment of gender diversity, they do not fundamentally alter the binary perception of gender.

Toward Inclusive Research: The Effect of Response Options on Gender Categorization of Faces

Many transgender and nonbinary people experience gender as flexible, fluid, diffuse, and not bounded by the typical binary of women and men (Hyde et al., 2019; Richards et al., 2016). Unlike cisgender people - who identify with their assigned gender at birth - transgender people identify with a gender different from their assigned sex at birth (Levitt & Ippolito, 2014). Moreover, many transgender people identify as nonbinary, which can be either an identity in and of itself or an umbrella term for a wide variety of gender identities other than woman or man (e.g., genderqueer, agender, genderfluid) (Monro, 2019).

In surveys and questionnaires that measure gender identity, however, gender has traditionally been constructed as a binary, where response options are limited to the categories of woman/female and man/male (Saperstein & Westbrook, 2021). Thus, these limited response options ignore TNB identities (Ansara & Hegarty, 2014). Recently, psychologists have been encouraged to include a wider range of response options beyond woman and man, such as “genderqueer” and “agender” (Saperstein & Westbrook, 2021) or use free text options (Lindqvist et al., 2020). As awareness of gender diversity is increasing, it is increasingly common to see studies including gender options beyond woman and man (see Carleton et al., 2022; Cronin et al., 2022; D’Agostino et al., 2022 for some recent examples). Research on gender categorization of others, however, is still dominated by binary response options (e.g., Campanella et al., 2001; Habibi & Khurana, 2012; Jung et al., 2019).

Two Challenges to the Gender Binary

An early challenge to the norm of binary measurement of gender in psychology came from Sandra Bem in the ´70s (Bem, 1974). She devised a scale that measured gender as a psychological trait, treating femininity and masculinity as two separate constructs. This scale allowed for combinations of gender scores that challenged previous binary conceptions. Such combinations included androgynous, which meant scoring high on both femininity and masculinity; and agender, which meant scoring low on both. Characteristically of research of its time, Bem still largely accepted the binary gender framework. In treating gender as a psychological trait rather than an identity, for example, the BSRI implicitly assumed all respondents were women or men.

A later group of challenges to the gender binary in psychology emerged in the 2010s and onward. These challenges, often drawing from feminist and queer scholarship (e.g., Butler, 1999), were explicit about the need for psychology to include trans and non-binary gender identities (Hyde et al., 2019; Morgenroth & Ryan, 2018; Richards et al., 2016). Saperstein and Westbrook (2021)] suggested that surveys measuring gender include a range of response options, such as non-binary, other, trans man, agender, and more. Lindqvist et al. (2020) suggested an open text entry where participants can fill in their gender in an open-ended format. The free text response has the advantage of being completely unconstrained, allowing participants to enter any category, including categories which may not have occurred to the researchers. Moreover, the acceptable terms sometimes shift over time as more marginalized voices are heard. The term transsexual, for example, has been widely used and seen as acceptable but is now understood to be stigmatizing (APA manual). A free text avoids this issue.

Historically, research in psychology primarily suggested ways to measure respondents’ own gender identity. This emphasis is understandable, as gender identity is a commonly reported demographic variable. But gender is frequently also measured in terms of participants’ categorizations of others. Because self-categorization and categorization of others are different processes, the best measurement of self-categorization may not be the best measurement of the categorization of others.

Measuring gender categorization of others

Research on how people perceive and categorize the gender of others has used both dimensional scales as well as discrete categories. It is fairly common, for example, to use the one-dimensional approach, where participants rate the gender of others as a single dimension, from masculine to feminine. Much of this research explores evolutionary and other reasons for gender in faces, correlating one-dimensional categorization of facial gender with other traits such as attractiveness (Little & Hancock, 2002) and distinctiveness (O’Toole et al., 1998).

Another common approach tasks people to categorize faces according to a set of response options decided by the researchers, almost invariably woman and man. Studies using this method have shown that people rapidly and automatically categorize gender (Habibi & Khurana, 2012; Jung et al., 2019). This, in turn, indicates that gender is a salient category that determines how people evaluate others on traits such as agreeableness, dominance, etc (Stolier & Freeman, 2017).

Moreover, participants categorize faces categorically (Campanella et al., 2001). This phenomenon has been observed when participants categorize faces that have been morphed to vary from feminine to masculine. Although a 60% female morph contains only slightly more female than male features, most participants categorized this female morph as female (Campanella et al., 2001). Such categorical effects for continuous stimuli in any domain suggest that people treat that domain as consisting of separate categories (Simanova et al., 2016). The observation of a categorical effect for gender, therefore, suggests that people treat gender as a binary consisting of women and men only.

However, this research has rarely considered the risk that the structure of response options could communicate certain ideas about gender to participants. A one-dimensional scale implies that gender can vary on a continuum. It also places masculinity and femininity at the endpoints of the scales so that a higher rating of femininity is, by definition, a lower rating of masculinity. This implies that someone cannot embody femininity and masculinity at the same time, indeed, that the two concepts are opposites. Binary response options consisting of woman/female and man/male only suggest that those are the only two categories that exist. On the other hand, two-dimensional scales and categories that include non-binary response options suggest the opposite, that femininity and masculinity are not mutually exclusive and that a multiplicity of genders exists. In other words, no matter which type of response options are used, ideas are being communicated to participants, potentially influencing their responses. Most recommendations suggest taking great care not to influence participants (Nichols & Maner, 2008), but the effects of gender response options are rarely considered.

Another aspect of gender categorizations of others is that complete certainty is not possible. This is because many trans and non-binary individuals are not androgynous in their gender expression (Richards et al., 2016). Therefore, if a person aims to be inclusive, abstaining from categorizing until more information is available is always the safest option when categorizing others. However, this aspect of gender categorization has received very little attention from researchers.

The purpose of Study 1 was to test the influence of one and two-dimensional response options by on categorical responses. Drawing inspiration from Bem (1974), we compare gender categorization measured using one-dimensional response options (ranging from woman to man) and two-dimensional response options. A categorical effect suggests participants treat gender as consisting of only two categories: women and men. Accordingly, a reduction in this effect would suggest participants take a more expansive view of gender. We tested two research questions: “Do participants respond categorically to faces?” (Research Question 1) and “Does a one-dimensional rating scale elicit stronger categorical responses than two-dimensional (Research Question 2)?”

The purpose of Study 2 was to investigate categorization using non-binary gender response options. We included multiple categories beyond women and men, as suggested by for example, Saperstein and Westbrook (2021), and we also included a free text as suggested by Lindqvist and colleagues (2019). Study 2 was mainly interested in how the two non-binary options compared in terms of responses other than women and men (Research Question 3). As non-binary options have been promoted by feminist and LGBTQ+ activists, their inclusion might have more generalized effects on binary categorization. Therefore, study 2 also investigated the categorization of women and men (Research Question 4).

Study 1

Method

Participants

Swedish participants (N = 71) took part in the Stockholm University campus (M_age= 37.87, SD_age = 14.08, Range = 18 - 73). Participants included 33 women, 35 men, and 2 participants who did not indicate gender (self-identified gender was measured using an open-ended text box, following Lindqvist et al., 2020). Participants were randomly allocated to one of the two response option conditions (N_control = 33, N_experimental = 38). Participants were monetarily compensated for their time (100 sek). In accordance with the Helsinki Declaration, all participants were informed that participation was voluntary and gave written consent to participate in the study.

Stimuli

The experiment included Black, Asian, and White faces from the London Face Database (DeBruine & Jones, 2017) and the Chicago Face Database(Ma et al., 2015) morphed with Webmorph (DeBruine, 2018). We selected matched pairs of faces of women and men, ensuring that the women were rated at similar levels of feminine as the men were rated masculine . The morphs were made in 7 steps, from completely feminine to completely masculine. We defined facial gender as the degree of the female face present in the morph. In other words, a 33% face was slightly masculine, a 50% face was an even mixture of the two faces, and a 100% face consisted only of the woman’s face. Because there were 18 face pairs morphed in 7 steps, the total number of faces was 126.

Procedure

Participants completed the experiment on a computer in a quiet room. Each trial consisted of a face accompanied by the question, “How would you gender categorize this person?”. In the one-dimensional condition, participants rated gender based on a single continuum, with the anchors marked woman and man. In the two-dimensional condition, participants rated each face once on a woman continuum (the anchors were marked not woman and woman) and once on a man continuum (anchors were marked not man and man). The separate continua were presented on different trials, and the order of trials was completely randomized in both conditions (see Figure 2).

Data analysis

We used R (Version 4.2.2; R Core Team, 2022) and the R-packages brms (Version 2.18.0; Bürkner, 2017, 2018, 2021), papaja (Version 0.1.1; Aust & Barth, 2022), and tidyverse (Version 1.3.2; Wickham et al., 2019). We fit Bayesian mixed-effects models to the data to test for patterns of responses consistent with categorical perception. In all models, facial gender (0 to 100 in seven steps) and response options (one-dimensional, two-dimensional) were included as fixed effects. Additionally, all models included varying intercepts for both participants and faces and varying slopes for facial gender. We modeled the predictor facial gender as an ordered factor with seven levels corresponding to each of the seven morphing steps. This allowed us to test for the kinds of non-linear patterns that would be observed under categorical perception, where changes in facial gender would be expected to have a larger effect on rated gender near the midpoint than at extreme values.

Results

We examined the relationship between ratings of woman and man in the two-dimensional condition. These were highly negatively correlated (R = -0.86). Therefore, man ratings in the multiple dimensions were reverse-coded for subsequent analyses. Second, we examined whether participants responded categorically to faces (Research Question 1). Individual-level (thin lines) and group mean (thick lines) responses are visualized in Figure 3. If participants respond according to the morph level, the lines should be a straight diagonal. Instead, Figure 3 shows that most participants display a non-linear S-shape, and this was also the pattern of the group means. Note that in the two-dimensional condition, participants rated each face twice.

To further test whether the faces were rated categorically, we calculated the difference between the mean ratings when facial gender was 33% and 67%. If participants respond linearly, this difference should be 34. Instead, in both the one-dimensional condition (M_1D = 59.58, CI_1D = [53.65, 65.26]) and the two-dimensional condition (M_2D = 58.75, CI_2D = [52.53, 65.08]) this difference far exceeded 34 and the narrow credible intervals suggest these measures were precisely estimated. We interpret this to mean that participants responded categorically.

Finally, we tested whether the categorical perception was reduced in the two-dimension condition compared to the one-dimension condition (Research Question 2). In other words, we calculated the mean difference between 67% faces and 33% faces in both conditions. The results suggested that categorical perception was not reduced by two-dimensional response options (Difference = -0.83, CI = [-5.57, 7.24], BF₀₁= 30.47).

Discussion

Participants responded categorically when rating faces in terms of gender. Additionally, two-dimensional response options did not reduce this effect. Indeed a highly binary view of gender was present and participants treated womanhood and manhood as opposites even though the scale would allow them to be more flexible. However, this scale only implicitly challenged the binary, as no diverse gender options were present.

Study 2

Study 2 tested a wider range of response options that explicitly challenge the gender binary. These were adapted from common ways to measure participants’ self-categorization of gender (Lindqvist et al., 2020; Saperstein & Westbrook, 2021). In Study 2 we compared three types of response options in a gender categorization task: 1) only woman and man; 2) woman, man and other and 3) an open text box for participants to type in their response. As a control condition, we also included a condition with only woman and man as response options.

Method

Participants

Swedish participants (N = 100) took part in the study at the Stockholm University campus (M_age= 36.89, SD_age = 13.69, Range = 18 - 69). Self-identified gender was measured using an open-ended text box as recommended by (Lindqvist et al., 2020). The final sample included 56 women, 47 men, and 2 participants who did not indicate gender. All participants were informed that participation was voluntary and all gave written consent to participate in the study. Participants were randomly allocated to one of the two response option conditions (N_binary = 32, N_multiple = 36, N_{free_text} = 32). Participants were monetarily compensated for their time (100 sek).

Stimuli

The stimuli were identical to those of Study 1.

Design and Procedure

The experiment used a between-participants design. There were three conditions with different response options: binary categories, free text, and multiple categories (see Figure 4). In the binary categories condition, the response options consisted of two categories: woman and man. In the free text condition, the response options consisted of an open text box. In the multiple categories condition, the response options consisted of four categories: woman, man, other, and I don’t know.

Participants completed the experiment on a computer in a quiet room. Each trial consisted of a face accompanied by the question, “How would you gender categorize this person?” After being allocated to one of the three conditions, participants categorized 126 faces according to the response options in their condition.

The outcome was responses to the categorization task. For analysis purposes, two new variables were created:

Other categorizations represented the trials where participants categorized faces as any other category than woman or man. Other was coded as 1 and all other responses were coded as 0. In the free text condition, participants’ responses were manually coded so that of other and non-binary counted as other.

I don’t know responses represented trials where participants did not categorize any gender category. I don’t know = 1 and all other responses = 0. In the free text condition, participants’ responses were manually coded so that variations of unsure and I don’t know counted as I don’t know.

Data analysis

Bayesian linear mixed effect models were fit to the data in study 2. These models were the same as in Study 1 apart froom the outcome, which was binomial and accordingly had to be modeled as a binomial distribution.

Results

Figure 5 illustrates the proportion of faces (y-axis) categorized according to the different conditions (different colors) at each level of facial gender (x-axis) across the three experimental conditions (separate plots). A simple visual inspection of Figure Figure 5 suggests that most faces were categorized as women or men. Participants did categorize faces outside of this binary in the multiple categories condition, as Figure 5 shows, and most such categorizations were made in response to androgynous faces.

Figure 5, however, only illustrates the total number of categorizations across all participants. This obscures the fact that some participants made many categorizations beyond the binary and some made few or none at all. Figure 6 illustrates how many categorizations (y-axis) beyond the binary participants made. Each bar represents how many participants (y-axis) made a certain number of categorizations (x-axis). The different colors denote the different categorizations. Participants who only categorized faces as women or men are not represented in figure Figure 6. In the Free Text condition, only two participants made any other categorization than woman and man, whereas more than half did so in the Multiple Categories condition (see Figure 6 ).The Bayesian mixed effects model suggested that participants made categorizations beyond the binary in the multiple categories condition compared to the free text condition (OR = 5.56, CI =[1.1, 27.97], BF₁₀= 4.55).

An inspection of Figure 5 suggests that participants made fewer man categorizations in the multiple categories condition than the other two conditions. We tested this by examining only responses of woman or man. Figure 8 illustrates proportions of responses of women and men. Each dot represents a single participant, and the position of the dots on the y axis shows the proportion of faces that participant categorized as man; the boxplots show median and interquartile range proportion of faces categorized as women (in this data set, with categorizations beyond the binary removed, any face not categorized as a woman was categorized as a man). Overall rates of binary categorizations were similar across the three conditions (see Figure 8).

We treated the binary categories condition as the control against which the other two conditions were compared. The results suggested that the proportion of faces categorized as women was similar in the Multiple Categories and Binary Categories conditions (OR = 0.68, CI =[0.4, 1.17], BF₀₁= 5.98). The results suggested that the proportion of faces categorized as women was the same in the Free text and Binary Categories condition (OR = 1.03, CI =[0.6, 1.78], BF₀₁= 15.27). In sum, neither the free text nor the multiple categories condition changed the pattern of categorization of women and men compared to the binary categories condition.

Discussion

In Experiment 2, we tested how free text options and multiple categories affected participants’ responses beyond the binary. Some participants made some categorizations beyond the binary in the multiple categories condition, but virtually none did so in the free text condition. Furthermore, additional response options reduced the absolute number of faces categorized as women and men (as participants selected some of the other options) but did systematically reduce categorizations of men more than women or vice versa.

General Discussion

Across two experiments, we tested how different response options influenced gender categorization. In Study 1, we compared two-dimensional scales with one-dimensional controls. We found that participants responded categorically, and this was the case in both the control condition and the two-dimensional condition. In Study 2, we compared free text and multiple categories. We found that only multiple categories elicited beyond-binary responses. Compared to binary control, neither changed the pattern of categorizations of women and men.

The results from Study 1 are consistent with previous work on categorical perception of gender in faces (Campanella et al., 2001, 2003). Participants exhibited a categorical pattern of responses where ratings of gender were more extreme than the facial gender. This implies that participants had a conception of gender as consisting of two distinct categories. Furthermore, the two-dimensional ratings did not reduce the strength of the categorical effect. This suggests that, at least in the present sample, two-dimensional response options were not enough to reduce the binary gender norms.

This differs slightly from the results of Bem (1974), who found that measuring gender as two separate scales led participants to treat gender as less binary. Moreover, where she found that masculinity and femininity were largely unrelated, we found that ratings of woman and man were strongly correlated. This is probably accounted for by the differences in outcome measures in Bem (1974) and in our study. Bem (1974) measured gender as a psychological trait in the self, whereas we measured gender as a judgment of the gender identity of others. The latter outcome is not only determined by the response options, but also by the physical features of the faces. In other words, judging the faces of others is a different task from judging one’s own characteristics, and one of the primary differences is the increase in external stimuli and influences.

The finding from Study 2 that participants use non-binary response options is consistent with the work of Saperstein and Westbrook (2021) and Lindqvist et al. (2020), which has shown that including flexible response options allows participants to better express themselves. A recommendation from that literature is that open text boxes afford participants the greatest flexibility in their responses. In our study that flexibility was rarely used when the response options consisted of a free text. This likely reflects the difference between transgender and gender-diverse participants categorizing their own gender and cisgender participants categorizing others.

A probable explanation for the difference between free text and multiple categories in Study 2 is that the multiple categories served as a visual reminder of non-binary identity. Researchers interested in the categorization of non-binary identity should be aware that these may not spring to mind unless participants are explicitly reminded of them.

Neither free text nor multiple categories influenced the categorizations of women and men. This suggests that such inclusive response options can be suitable for investigating the categorization of women and men without skewing the results or introducing noise. This is a positive finding for researchers who are primarily interested in such categorizations but do not want to contribute to the marginalization of trans and non-binary individuals.

Overall, we recommend researchers include non-binary response options in gender categorization studies. Multiple dimensions, free text, and multiple categories and continua are all viable alternatives. If the primary research question is to investigate non-binary categorization, then multiple categories are most suitable. However, if the goal is to measure the categorization of women and men, free text or multiple categories may be equally suitable.

Limitations and future directions

One limitation of this study is the sample size. The Ns in each condition are below many of the conventional recommendations in social psychology. However, these recommendations are typically made based on the assumption of a single trial per participant. In contrast, each participant completed 126 trials in our experiment. This allows for precise detailing of the within-participant processes. As such, the present study resembles psychophysical experiments, which also feature few participants carrying out many trials. Power is often portrayed as a function of sample size, and this is true, the number of trials is also a factor in power (Judd et al., 2017). Indeed, the overall analyses included more than 8000 data points in each experiment, and the final estimates were measured with a high degree of precision. That said, we note that the generalizability of the experiment is somewhat reduced.

Another limitation of this study is that it does not account for the influence of markers of gender other than faces. Such markers include hair, clothes, and makeup and Transgender and gender diverse often use such markers to signal their gender to others. Moreover, the faces used here were not “realistic” in that they did not realistically depict gender diversity as it is often displayed in the real world. In that sense, it is possible that we underestimate the rates of people responding with one of the options beyond the binary.

Conclusion

In two studies, we tested how different response alternatives affected gender categorizations. In Study 1, participants responded categorically to the faces, both when rating gender using one-dimensional and two-dimensional scales. This suggests that participants generally had a binary conception of gender, which was not influenced by response options. In Study 2, participants were more likely to categorize faces beyond the binary when using multiple categories, including non-binary and I don’t know than when using a free text option. In comparison to self-identification questions, where open-ended responses are seen as the most inclusive alternative (Lindqvist et al., 2020), the categorization of others benefits from response options that explicitly remind participants that not all people identify as women or men.

References

Ansara, Y. G., & Hegarty, P. (2014). Methodologies of misgendering: Recommendations for reducing cisgenderism in psychological research. Feminism & Psychology, 24(2), 259–270. https://doi.org/10.1177/0959353514526217

Aust, F., & Barth, M. (2022). papaja: Prepare reproducible APA journal articles with R Markdown. https://github.com/crsh/papaja

Bem, S. L. (1974). The measurment of psychological androgyny. Journal of Consulting and Clinical Psychology, 42(2), 155.

Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28. https://doi.org/10.18637/jss.v080.i01

Bürkner, P.-C. (2018). Advanced Bayesian multilevel modeling with the R package brms. The R Journal, 10(1), 395–411. https://doi.org/10.32614/RJ-2018-017

Bürkner, P.-C. (2021). Bayesian item response modeling in R with brms and Stan. Journal of Statistical Software, 100(5), 1–54. https://doi.org/10.18637/jss.v100.i05

Butler, J. (1999). Gender trouble: Feminism and the subversion of identity. Routledge.

Campanella, S., Chrysochoos, A., & Bruyer, R. (2001). Categorical perception of facial gender information: Behavioural evidence and the face-space metaphor. Visual Cognition, 8(2), 237–262. https://doi.org/10.1080/13506280042000072

Campanella, S., Hanoteau, C., Seron, X., Joassin, F., & Bruyer, R. (2003). Categorical perception of unfamiliar facial identities, the face-space metaphor, and the morphing technique. Visual Cognition, 10(2), 129–156. https://doi.org/10.1080/713756676

Carleton, R. N., McCarron, M., Krätzig, G. P., Sauer-Zavala, S., Neary, J. P., Lix, L. M., Fletcher, A. J., Camp, R. D., Shields, R. E., Jamshidi, L., Nisbet, J., Maguire, K. Q., MacPhee, R. S., Afifi, T. O., Jones, N. A., Martin, R. R., Sareen, J., Brunet, A., Beshai, S., … Asmundson, G. J. G. (2022). Assessing the impact of the Royal Canadian Mounted Police (RCMP) protocol and Emotional Resilience Skills Training (ERST) among diverse public safety personnel. BMC Psychology, 10(1), 295. https://doi.org/10.1186/s40359-022-00989-0

Cronin, K. A., Leahy, M., Ross, S. R., Wilder Schook, M., Ferrie, G. M., & Alba, A. C. (2022). Younger generations are more interested than older generations in having non-domesticated animals as pets. PLOS ONE, 17(1), e0262208. https://doi.org/10.1371/journal.pone.0262208

D’Agostino, M., Levine, H., Sabharwal, M., & Johnson-Manning, A. C. (2022). Organizational practices and second-generationgGender bias: A qualitative inquiry into the career progression of U.S. State-level managers. The American Review of Public Administration, 52(5), 335–350. https://doi.org/10.1177/02750740221086605

DeBruine, L. (2018). WebMorph. In WebMorph. https://webmorph.org/.

DeBruine, L., & Jones, B. C. (2017). Face Research Lab London Set. Figshare. https://doi.org/10.6084/m9.figshare.5047666

Habibi, R., & Khurana, B. (2012). Spontaneous Gender Categorization in Masking and Priming Studies: Key for Distinguishing Jane from John Doe but Not Madonna from Sinatra. PLoS ONE, 7(2), e32377. https://doi.org/10.1371/journal.pone.0032377

Hyde, J. S., Bigler, R. S., Joel, D., Tate, C. C., & van Anders, S. M. (2019). The future of sex and gender in psychology: Five challenges to the gender binary. American Psychologist. https://doi.org/10.1037/amp0000307

Judd, C. M., Westfall, J., & Kenny, D. A. (2017). Experiments with More Than One Random Factor: Designs, Analytic Models, and Statistical Power. Annual Review of Psychology, 68(1), 601–625. https://doi.org/10.1146/annurev-psych-122414-033702

Jung, K. H., White, K. R. G., & Powanda, S. J. (2019). Automaticity of gender categorization: A test of the efficiency feature. Social Cognition, 37(2), 122–144. https://doi.org/10.1521/soco.2019.37.2.122

Levitt, H. M., & Ippolito, M. R. (2014). Being transgender: The experience of transgender identity development. Journal of Homosexuality, 61(12), 1727–1758. https://doi.org/10.1080/00918369.2014.951262

Lindqvist, A., Sendén, M. G., & Renström, E. A. (2020). What is gender, anyway: A review of the options for operationalising gender. Psychology & Sexuality, 1–13. https://doi.org/10.1080/19419899.2020.1729844

Little, A. C., & Hancock, P. J. B. (2002). The role of masculinity and distinctiveness in judgments of human male facial attractiveness. British Journal of Psychology, 93(4), 451–464. https://doi.org/10.1348/000712602761381349

Ma, D. S., Correll, J., & Wittenbrink, B. (2015). The Chicago face database: A free stimulus set of faces and norming data. Behavior Research Methods, 47(4), 1122–1135. https://doi.org/10.3758/s13428-014-0532-5

Monro, S. (2019). Non-binary and genderqueer: An overview of the field. International Journal of Transgenderism, 20(2-3), 126–131. https://doi.org/10.1080/15532739.2018.1538841

Morgenroth, T., & Ryan, M. K. (2018). Gender trouble in social psychology: How can Butler’s work inform experimental social psychologists’ conceptualization of gender? Frontiers in Psychology, 9. https://doi.org/10.3389/fpsyg.2018.01320

Nichols, A. L., & Maner, J. K. (2008). The good-subject effect: Investigating participant demand characteristics. The Journal of General Psychology, 135(2), 151–166. https://doi.org/10.3200/GENP.135.2.151-166

O’Toole, A. J., Deffenbacher, K. A., Valentin, D., McKee, K., Huff, D., & Abdi, H. (1998). The perception of face gender: The role of stimulus structure in recognition and classification. Memory & Cognition, 26(1), 146–160. https://doi.org/10.3758/BF03211378

R Core Team. (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

Richards, C., Bouman, W. P., Seal, L., Barker, M. J., Nieder, T. O., & T’Sjoen, G. (2016). Non-binary or genderqueer genders. Int Rev Psychiatry ., 28(1), 95–102.

Saperstein, A., & Westbrook, L. (2021). Categorical and gradational: Alternative survey measures of sex and gender. European Journal of Politics and Gender, 4(1), 11–30. https://doi.org/10.1332/251510820X15995647280686

Simanova, I., Francken, J. C., de Lange, F. P., & Bekkering, H. (2016). Linguistic priors shape categorical perception. Language, Cognition and Neuroscience, 31(1), 159–165. https://doi.org/10.1080/23273798.2015.1072638

Stolier, R. M., & Freeman, J. B. (2017). A neural mechanism of social categorization. The Journal of Neuroscience, 37(23), 5711–5721. https://doi.org/10.1523/JNEUROSCI.3334-16.2017

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Figure 1

Example of a seven-step morphing spectrum

Figure 2

Sample trial from each of the three conditions


Attaching package: 'gridExtra'

The following object is masked from 'package:dplyr':

    combine

Figure 3

Participant level and mean ratings of faces in One-dimensiona and two-dimensional conditions

Figure 4

Sample trial from each of the three conditions

Figure 5

Gender Categorizations by Participants

Figure 6

Responses of other and I don't know across the multiple categories and free text condiitons

Figure 7

Alternative version of the previous figures

Figure 8

Participant Proportions for Categorizing Faces as Women Across Three Conditions