Teamology: Evaluative Diversity Promotes Success

Teamology: The Construction and Organization of Effective Teams“Teamology” is the name of a new branch of science somewhere between psychology and sociology. It studies teams and what makes them successful. This seems like an important new science, given that the impact of evolution on the human genome has been increasing and optimizing the success of competing teams rather than of individuals. However, experiments turn out to be logistically far more difficult to conduct in teamology than in psychology. All of the research mentioned in Douglass Wilde’s Teamology: The Construction and Organization of Effective Teams relied on the power of professors to make guinea-pigs out of students (Carnegie-Mellon, Stanford, Loyola U of Los Angeles, Oregon State, Shanghai Jiao-Tong, Sungkyunkwan U., U.C. Berkley, U.C. San Diego, U. of Florida, and U.T. Austin)

Teamological evidence is crucial to management of evaluative diversity because the reasons to protect evaluative diversity are:

  1. Love: For the sake of our children and grandchildren who are likely to be diverse
  2. Religious: For the sake of the One who created diversity
  3. Selfish: For our own sake, believing that we are part of teams which need evaluative diversity (as ecosystems need biodiversity)

The first two motives assume merely that we have diverse predispositions, a hypothesis which is well-confirmed by a wide range of experiments as detailed in Predisposed: Liberals, Conservatives, and the Biology of Political Differences. The third motive additionally assumes that diverse teams are more likely to succeed. Teamology is the only field in which experiments can confirm or reject that hypothesis (or tell us which kinds of diversity are beneficial when).

There is strong theoretical reason to expect team success to rely on evaluative diversity. GRIN-diversity reflects specialization in mitigating distinct factors which limit rate of adaptation:

  • Gadfly to increase the rate at which novel configurations are produced
  • Relational to increase network localization through subjective evaluation
  • Institutional to increase fidelity with which proven configurations are reproduced
  • Negotiator to increase selection pressure privileging better configurations

If it turns out that GRIN-diversity does not maximize rate of adaptation, then there must be something wrong with our theory of evolution. One test of the GRIN model involves comparing the success of computer systems with different GRIN-diversity. Computers with a dearth of any GRIN-type fail just like any machine missing one of its essential parts. Having found that confirmation in machines, it makes sense to compare the success of human teams with varying GRIN-diversity.

Evaluative Diversity in Human Teams

Douglass Wilde taught engineering at Stanford University. Each year, his students would work in teams to produce reports which would be entered into competition against each other and against teams from other universities. Teamology presents the method Wilde developed to form and organize winning teams. The method was tested over the course of decades and at U.C. San Diego, U. of Florida, and Jiao Da (Shanghai).

Wilde’s method essentially involved:

  1. Measuring students’ evaluative types,
  2. Dividing into teams so as to maximize evaluative diversity, and
  3. Assigning roles within each team to match measured types.

Wilde’s research was conducted before there was any way to measure GRIN-types. His assignment algorithm used a survey of preferences along Jungian dimensions: Introversion (I) vs. Extroversion (E), Structure (J) vs. Flexibility (P), Facts (S) vs. Possibilities (N), and Objects (T) vs. People (F). Here are Wilde’s formulas to transform those measures into preferences for eight roles:

Wilde formulas

Wilde reports that about 25% of Stanford teams won awards when self-selected, but about 75% won awards when formed by this method. Replication studies found similar results, though they measured success differently and also found that diverse teams “took longer to coalesce” than randomly formed teams did.

How Many Types?

Jungian personality theory disagrees with the Big Five model on the question of traits vs. types. Size is an example of a trait, while sex is an example of a type. We often point-out that types are of discrete categories, while traits fall along continuous scales, but in the context of teamology it may be more important to note that stable types are interdependent, while traits are not. For example, a human society could thrive in certain environments without any especially large members, but could not thrive in any environment without any females (or males).

Interdependency impacts the ideal number of individuals per team. For example, since bees have three sexes, their “families” should be larger than in species with fewer sexes. In contrast, diversity in traits is valuable only to accommodate diversity of situations, so diversity in traits will afford a team no advantage over the best possible individual when the situation is stable or when the individual can adjust his/her traits to match the situation. If adjustment is not feasible, a team of just two polar-opposite members could have full diversity in traits. Thus, if traits were the only source of valuable diversity, then teamology wouldn’t be so important (at least beyond pairs).

Jung’s theory predicts at least eight types and no traits, but the statistical characteristics of measures of Jung dimensions look like traits rather than types. The Big Five model predicts all traits and no types. Truth is probably somewhere in the middle—some traits and some types. The GRIN-SQ produces the statistics to prove that at least four types exist. One might wonder whether teamology could be used to further increase the number of types proven to exist.

The studies Wilde cited involved teams of three to five members each, so they could not possibly have demonstrated the interdependence of more types. If they balanced types, those types might best be called S, N, T and F, since those variables are doubled in his formulas. In the data Wilde provided from his 2006 class, of the 13 students assigned to fill multiple roles, 92% were assigned to be both P and J and 69% were assigned to be both E and I, so any specialization would have been on other dimensions. As implied by the diagram above, the typical team had one member with primary specialization in each quadrant (though some students were also assigned to serve as back-up for other quadrants).

The experiments described in Teamology compared teams formed by Wilde’s method to teams formed randomly or though pure self-selection. It would be far more instructive to compare to teams with all but one type, so that one might identify specific types which make a difference (and perhaps characterize the difference each makes). Wilde initially doubled team performance merely by assigning the students with highest MBTI-Creativity Index (T+2E+2P+6N) to separate teams (leaving no black-hole of gadflydom), but tripled performance relative to self-selection by separating the highest scorers in all eight roles. The difference between these experiments does imply that creativity diversity isn’t the only kind that matters, but specifically what else matters remains to be measured.

Separation of Powers

The reason why the diagram above divides the roles on the left against the roles on the right is that Wilde’s scoring formulas mathematically make those on the left equal to the negative of those on the right. For example, even if a student’s two most preferred roles really were Tester/Prototyper (E+P+2S) and Visionary/Strategist (I+J+2N), the results of Wilde’s preference measure could not possibly reflect that reality. They sum to zero, so at least one is guaranteed to be zero or negative. That is a consequence of the assumption that diversity is structured around dimensions.

It would not be surprising that  the person responsible for devising visions should like to be the person with the power to decide whether those visions are good nor that the person responsible for empathizing should like to be the person with the power to interpret policies (and thus to show mercy). However the danger in mixing such roles is rather obvious—we might call it “conflict of interest”—so we can appreciate the separation of powers forced by Wilde’s method. Wilde’s claim that Visionaries should not be the Testers sounds reasonable (and is supported by his research), but this might have nothing to do with preference.

Are Teams With Greater GRIN-Diversity More Successful?

In theory, the S, N, T and F roles sound like the four GRIN-types:

  1. The S roles include “Tester”, “Investigator” and “Inspector” which match the Negotiator specialization in selection
  2. The N roles include “Innovator”, “Entrepreneur” and “Visionary” which match the Gadfly specialization in generating novelty
  3. Wilde’s measure for T associates it with “logic”, “truthful”, “unaccommodating”, “intolerant” and “impartial” all of which match the Institutional specialization in fidelity.
  4. F would be Relational by process of elimination. Specialization in network localization is undermined in Wilde’s experiments because the structure of students’ social networks is designed and enforced by the experimenter. However, students would be accustomed to social processes developed for groups formed more naturally, so a team lacking relational evaluators would have the handicap of needing to engineer new social processes (e.g. radically new ways to resolve conflicts). Thus, a Relational member might be valuable even in engineered teams.

Empirical comparison of measures confirms that teams formed by Wilde’s method would have greater GRIN-diversity than teams formed at random. N correlates strongly with the Big-Five dimension of “Openness” which is significantly related to Gadfly evaluation. F correlates moderately with the Big-Five dimension of “Agreeableness” which is significantly related to Relational evaluation. Thus, teams formed by Wilde’s method are likely to include one natural gadfly, one naturally relational person, and two people of other GRIN type(s).

Yes, the more successful teams do have greater GRIN-diversity. Again the GRIN model is supported.

But what we really want to know is in which circumstances any of the four GRIN-types might not promote success. To measure that, we would need to compare teams with each type deficiency (and with none) in different circumstances, and it would be better to use direct measures of GRIN-type than to use Jung-types as a proxy. Also, instead of imposing team structure, it would be better to let people form (and re-form) their own teams, and teams-within-teams (unless people segregate so much that they offer no opportunity to observe naturally formed diverse teams). There is much research yet to conduct.