AI Insanity

There is a quote that is widely mis-attributed to Albert Einstein.

Insanity is doing the same thing over and over again expecting a different result.

While the origins of the quote are uncertain, the sentiment is clear: if you repeat the same actions, you’ll get the same results. Anyone with a scientific or empirical bent will surely agree with the premise, as will anyone who has played golf.

This comes to mind today on the heels of a story from Reuters that Amazon has scrapped a program intended to speed up the review of applications by applying Artificial Intelligence (AI) tools to the process. The program, it seems, was dutifully duplicating all the innate biases from the past.

We had two shifts of people working at scanners entering them into the applicant database.

I deeply sympathize with Amazon’s plight. When I was VP of Human Resources at Microsoft, the company’s applicant popularity was astounding. We were getting tens of thousands of résumés a month. We had two shifts of people working at scanners entering them into a database. We tried all manner of sorting, filters, and pattern recognition to separate the wheat from the chaff.

It didn’t work. We missed vast numbers of bright people (and wasted time on countless misses) because they chose not to use the current buzzwords. Or because their font choice was misread by the scanning software. Or we just weren’t looking for the right things. We still needed real humans to look them over, even if each review took mere seconds.

I fully understand the desire to use the latest technology to try to assist in battling the onslaught.

Amazon is like Microsoft at that time, among the hottest places to work, and I’m sure the are swamped with applications. I fully understand the desire to use the latest technology to try to assist in battling the onslaught. Had we had AI at the time, I would have tried it in a heartbeat.

But just like our filters and scanners, the AI tools have an inherent problem. They are “trained”, and the decisions about the training data set make all the difference in what choices the system makes. Like virtually every computer system AI systems are subject to the old adage: garbage-in leads to garbage-out .

From what we can tell, Amazon used a vast array of résumés and decisions from the past to teach the system to find the needles in their haystack. Here are all the applicants for this job, here’s the person we chose. I sympathize with this approach, they have an enormous data set to work from. The problem is, of course, it replicated their past behavior precisely.

The system showed inherent biases against women and other groups. It preferred male-dominated language, and expressed precisely what the company had presumably done in hiring over the years. And there’s the problem.

The better way to train the system would have been to use a pool of applications, and then manually coded them for the desired outcome, controlling carefully against bias. But this would have taken forever, required great manual effort, and resulted in a much smaller training data set. Unfortunately AI systems are at their best with very large training inputs. So they chose the more tractable path.

To their great credit, Amazon scrapped the system when they realized the issue. They deserve kudos for that decision, other firms might well have pressed on, hoping the results would improve with more data. Continuing to do the same thing over and over, and expecting a different result. Amazon didn’t and that’s worth recognizing.

At the highest level, this points out the complexity of the diversity issue.

At the highest level, this points out the complexity of the diversity issue. Biases, either obvert or more subtle, are deeply ingrained in a company’s  culture. Correcting those takes explicit, conscious, and proactive behavior changes. Something no automated system is likely to be of much assistance with. I have many more ideas, thoughts, and comments on how to make these changes, but those will have to wait for another day.

In the meantime, let’s celebrate at least one company that is actively recognizing they have a hiring challenge. And is actively searching for ways to not repeat the past.

Testing By Any Other Name


I’ve written about testing of employees before (see here and here), and it seems like a topic that just won’t go away. The other day, I made a comment on Matt Mullenweg’s blog about hiring and age discrimination and such. This prompted an email from someone on the cutting edge of employee testing.

Seeing that I used to be the VP of HR at Microsoft, Bryan Kennedy of Pairwise sent me an email asking for comment on his company’s new testing tool (you can see it here). They have developed a personality test that relies on images rather than words, a high tech Rorschach test. They are trying to sell it to companies like Microsoft and Google for their recruiting efforts. As Bryan sees it working:

An employer, such as Microsoft, asks their current employees to take the 5-10 min test. The test is posted on their web site for candidates to play.

After an interested candidate takes the test, they would see comparative results with current MS employee generalizations. Ex:
“You’re more outgoing than the average engineer.” or “You have a similar work-play ethic.”

It might even suggest for which teams you’d be most interested in applying. At that point, you’d be encouraged to submit your resume. The hiring director would see a report about the findings from the test and could use them at their discretion.

  • It’s far less (if not nearly impossible) to game than other written tests
  • It empowers candidates to do their own evaluation – rather than being a hidden metric of assessment, it’s a tool for figuring out if you’d like it there
  • It’s a way for companies to attract hires in a new and interesting way

When I sent him off to read my opinion on testing (see here and here), he replied with: “I side with the argument that as long as they’re used properly and taken with a grain of salt, their value is probably (!) not null.”

I respectfully disagree. The more I look into testing, the closer the value asymptotically approaches zero. I responded to Bryan with a million questions. Here are some of my concerns:

  • How does one get the 50,000 employees of Microsoft to take the test? Why would an employee want to take the test? Some mandate from above? Wouldn’t it just reinforce the impression that HR is just a bunch of wishy-washy voodoo? If there’s nothing in it for me to take it, except potential bad outcomes, why should I take it? What kind of complete rate would one expect (10% – 20%)? Doesn’t that make the results useless?
  • The best engineers/employees are the ones least likely to complete the test. For tests like Myers-Briggs most people who take it are people who are a) having some difficulty and have turned to the test to help them out, b) are trying to please management in some way, or C) just the “I love to examine my navel” kinds of people. That doesn’t seem like a good healthy cross-section.
  • “The hiring director would see a report about the findings from the test and could use them at their discretion” is extremely dangerous. Doesn’t it risk them making homogenous teams, intentionally or not, just as Matt’s post was commenting on? Doesn’t it even risk lawsuits from candidates challenging the use of the test in the hiring process?
  • They say it’s impossible to game, but I find that difficult to swallow. A picture of a puppy, a picture of a skateboarder… hmmm… let me guess what you want me to like, then I’ll answer that way. Or, “I’m feeling mischievous today, I’m going to answer A or B based on the drum solo in Inna Gadda Da Vidda”. I tried it with my photography background in mind — I picked ones more in focus. The result was gibberish.
They say it’s impossible to game, but I find that difficult to swallow.
  • It raises the “what do I do with this info” question. Let’s say it comes up with “you are an unmotivated, introverted, a&$hole”, what do I do now? Take it again and choose all the other pictures? Try to change? Go see a shrink?
  • How do I know if I’d like it there from the results of this test? Because everyone is like me? And just as the post on photomatt was talking about, don’t I want to be different? Isn’t is a good thing to not be like everyone else? Or should I want to go there just to change the culture?
  • As for enticing people to apply, I find this assertion stunning… If I was asked for any kind of psychological test, I’d immediately withdraw my candidacy. It tells volumes about the kind of company it is, (see this post). It speaks to the kind of company culture there is. They don’t see people as individuals and potential valuable assets, but rather as test subjects, resources, automatons. I think this drives away candidates, not attracts them.
  • Is the testing based on lots of psychological merit, or is this just some guys who have a great idea? Do they have PHDs on staff who are developing these tests? Have the tests received any kind of peer review in the psychology world?
  • Do they charge by the test? Or by the results? Does [Microsoft/Google/hiring company] get the results in some permanent way, or am I tied to their web site? Do they own the results or do I?
  • How do they control access to the tests? Who can see results? My manager? Only HR? The police/FBI (we caught a child-porn king at your company, we want this as evidence)? How do I tell them who controls that access? Doesn’t that mean I basically have to upload a complete company-wide global hierarchy? Do I host the test on my systems? If not, how do I know their systems are secure (aren’t being keylogged, etc.)?
  • Can I stop the test (my phone just rang) and pick it up tomorrow? Doesn’t that potentially change my answers? What’s preventing me from “team-taking” the test, with me and 6 of my drinking buddies sitting around and having fun taking it?
  • There are many stills from movies and television, as well as some familiar looking images. Are they all licensed properly?
  • Last but not least, how does one define “like better”, as in “which picture do you like better”?

That’s a lot of questions/concerns to be sure. But it doesn’t even hit my most important concern: bias. Do people from the US or India or Europe or China all respond to the same pictures in the same way? Do they even know what some of the images are in the pictures?

But it doesn’t even hit my most important concern: bias.

There’s the classic story about an original version of MS Mail that used a mailbox for a symbol (you know the long box with the round top and a flag on the side). Turns out that’s not at all what a mailbox looked like in 85% of the world, and most people thought it was a trash can. How do they control for that?

What about racial/ethnic bias? In some parts of the world black people are rare/unheard of and are symbols of evil, in other parts of the world they are what everyone looks like. And certainly, to some people a skater-boy with baggy pants is an icon, to others it’s a nuisance, to still others it’s a “rich white kid”.

And reviewing the site only makes me more concerned. As I noted there are dozens of culturally specific images. Stills from the Sound of Music, the Lord of the Rings, the roadrunner, the Three Stooges, South Park, Titanic, Gladiator – won’t results differ depending on whether I’ve seen these?

There are snowy scenes, what if I’ve never seen snow? An old dial phone, a marijuana plant, and the Eiffel tower – isn’t it possible I have no idea what these are? Food dishes with meat and shrimp, what if I’m a vegan? Things written only in English on the images (“die” in one example).

And it turns out I didn’t need to be concerned about pictures of black people. All of the people I saw on the test were white. Ouch.

This test looks like just a new take on the long history of testing nightmares. I don’t know how anyone could or would want to use the results to tell them anything they’d want to know. Cute game, but not much more.

Recruiting’s “Big Billers”

Who is your recruiter working for? As I often say, symbols are everything. And is there a bigger symbol than money? If you want a symbol for the big business of recruiting, look no further than XtremeRectruiting‘s web site.

Xtreme Recruiting

You are looking to hire an executive for your company, your knee-jerk reaction is to call a big time headhunter. Or you want to get a new job, and you decide to engage one of these superstars yourself. Some of the first questions you should ask yourself are: Who’s the customer here? Who is the rescruiter working for? How important am I to this recruiter? This web site can help you understand.

The front page is a list of the “big billers”, people who have made a fortune in the last year as a recruiter. These people made somewhere from $400k to over a $1m last year. Not bad for a small company — but wait, these are individuals.

Now there is nothing wrong with success. But this web page doesn’t say how well they serviced clients. It doesn’t talk about their skills, what they bring to the table, how they will help you. No, it is a celebration of the “big billers” just for their acumen in bringing in big bucks. [Editor’s note: I find it wonderfully ironic that this site is a .org site, supposedly reserved for non-profits…]

It is a celebration of the “big billers” just for their acumen in bringing in big bucks

Good for them, but is that what you want? Sure you want someone who is experienced, someone who is good at what they do. But is that judged by their billings? Is that judged by their standing among their peers in their top dollars? No of course not. People who appear on this web site should be ashamed of themselves for waving their income in the face of everyone. And you should look elsewhere for help.

I have a lot of good advice on recruiting elsewhere. Trust me, “seek out the big billers” is not on the list.

Should we use online job sites?

When people ask about recruiting, one of the first questions they ask is “where do I find good people?” One of the first straws they grasp at is an online job site such as Monster, CareerBuilder and other more local sites. I think this is a mistake.

Online job sites are lousy places to look for job candidates.

I have a number of things to say about sourcing candidates elsewhere, but this question comes up so often it deserves special mention here. Online job sites (and their older relatives, the newspaper classified ad) are lousy places to look for job candidates. There are several reasons why.

First and foremost, they are far too broad-stroke, casting a net nation- or world-wide, and including every human looking for work in the world. While at first blush that may seem like a good thing, in reality, it’s a disaster. Start with the fact that 9 out of 10 responses you get for your ad will be worthless, and separating out the good ones is a painful chore. Most are simply unqualified, and they have skirted the site’s filters. If you are looking for things to do, and enjoy getting unintelligible emails from all over the planet, this may be fun. But for most of us, this is a royal pain.

9 out of 10 responses you get for your ad will be worthless

This broad stroke also has another downside, you don’t effectively reach the people you want to reach. Think of it in terms of your normal product advertising. If you sell, let’s say, electrical connectors (like AMP, for example), the last thing you would do is advertise on the SuperBowl. Sure it would be expensive, but the real problem is that only 1 out of 100,000 people you reach is your customer. So the vast majority of people you reach are saying “huh?” You are killing an ant with a sledgehammer. Most people would agree that, for AMP, an ad in a trade journal would be far more effective. The same is true for job candidates.

But, you are saying, the cost is negligible (the ads are really cheap), so what’s the harm in casting a wide net? The harm is that you waste time, energy, and goodwill, and you don’t effectively reach those you do want to reach. You look desperate, unfocused, random, and unprofessional. And for all your trouble, you don’t get the quality message across to the candidate you really want.

Also important is to consider is who you do reach. By definition, people who post to Monster are looking for work. Why? Because they are either (a) out of work or (b) about to be out of work. In either case you don’t want them. Sure, some very small percentage of people (<1%) are out of work because of no fault of their own — they were laid off, or they quit because their job situation was untenable. But I think you may not want even them — again, consider why they got laid off vs. their peers. Surely the company doing the layoff kept the winners...

The people trolling online job sites are the losers

It has been my experience that the people trolling online job sites are the losers. They are the perennially out of work people, the difficult people who get fired or laid off first, the unmotivated who can’t keep a job and can’t understand why, and the totally unqualified grasping at jobs well beyond their reach. In short, the first people you would weed out of your candidate pile. So why add them to the pile in the first place?

I have lots of ideas for where to search for good candidates, I’ll post them here soon. In the meantime don’t grasp at the straw of online job sites.

Should I test potential hires?

One of the hot button questions these days is what kind of testing should you do on potential hires. There are a number of aspects to this question, and there are moral, legal, and psychological implications that need to be considered.

First, it’s important to consider what type of testing we’re talking about. Some that people consider are: drug testing, skills testing, and psychological testing. Note, this is not a discussion on background checks, which are discussed in another FAQ, but rather the kind of testing you administer before you hire someone.

Drug Testing

Cup of Urine

Drug testing is an extremely difficult subject. For many organizations, it is simply not optional. They have people in dangerous, secret, or sensitive positions, and testing is required by either law or insurance. My advice to those groups is simply to have very strict, consistent, and professional processes, and get on with it.

For the rest of the world, I’m not a big fan of drug testing. The distrust it shows to your potential new hire is huge. Like copy protection schemes in software and/or music, or prenuptial agreements in relationships, it assumes that everyone is wrong, and you need to weed out the few honest people from the mass of criminals. The implied message to the new hire is corrosive. They have to be thinking, “if the first thing they want out of me is a cup of my urine, what else is coming?” For one company’s interesting take on drug testing, check out this post.

“If the first thing they want out of me is a cup of my urine, what else is coming?”

And, let’s be real, if they have a drug problem that will interfere with their work performance, you should be able to tell it in the interview or at least in the first few weeks of work. If it doesn’t show up until later, then that means you need to have a regular regime of testing. That starts sounding scary.

Finally, if it doesn’t interfere with their work performance, frankly, I don’t think it’s your problem. So why test for it?

Skill Testing

Skill testing is another matter altogether. I strongly believe that you want to check potential hires’ claims of ability against reality. The question is how to do it most effectively, and least insultingly.

Algebra Test

At Microsoft, there is a heritage of doing the “interview from hell,” where the candidate is subjected to a battery of abuse by a parade of interviewers. Examples of this are shown in the book “How would you move Mount Fuji” by William Poundstone (ISBN: 0-316-77849-4) which chronicles how Microsoft, and other companies, use puzzles to get at the meaningful question of “how smart is this person?” I discuss this an much more about recruiting and interviewing elsewhere in the “recruiting” category.

But more common is specific skills testing such as having the candidate take a multiple choice test to examine their skills with some required computer programs, or to test their specific industry knowledge. These multiple choice tests don’t get to the question of how smart a person is, but they do check to see if they are lying on their resumè. If you have a lot of candidates and really need to separate the wheat from the chaff, then multiple choice tests like these are OK.

The question is how to do it most effectively, and least insultingly.

To find out how smart someone is (a very important question) you have to ask questions that draw out how they think. You have to watch them solve a problem and see if they can work effectively through it. That’s where “Mount Fuji” comes in, and that’s a discussion for another FAQ.

So, if you have very specific skills requirements, and a lot of candidates to screen, then I guess it is OK to do multiple choice tests. But here again, you have to think about the psychological effect it has on the candidate: their first interaction with your company is the firm questioning their integrity. Ugh.

Psychological Testing

Fortune Teller

Finally there is the battery of psychological testing that some firms use. This is, plain and simple, a ridiculous waste of time and an insult to the candidate. If you’ve ever taken, given, or read the results from these tests (such as Myers-Briggs, and others) you know how silly they are. These tests are just this side of Nancy Reagan’s astrologer. You can no more find something useful or valuable about a person from this kind of testing that you can from reading the leaves in the bottom of their tea cup.

The real issue is that, even if these tests were valid (which I truly doubt), even if people didn’t game them (which I believe everyone does), and even if they could put you in some magic bucket, what do you do with it? Do you go into a recruiting situation knowing exactly what type of person you need for an opening? Have you ever been able to predict how a person would work in a group? Or how the group would react to a new person?

Have you ever been able to predict how a person would work in a group?

Teams are not predictable. The chemistry that makes up a team is delicate, volatile, and mysterious. Sometimes you have a laid back team and need to add an aggressive sort, you do it, and the team rises to new heights. Other times, you do it and the team falls apart. There are just too many variables, too many subtleties, too many types of people. Trying to categorize people and predict how precisely they will effect a team is a hopeless case. Don’t try it.

These tests can’t tell you if you the candidate will embezzle, or even if they are an axe murderer. And the downside is, again, the effect they have on the candidate. Again you leave the candidate thinking: the first thing they ask me is if I like the color red, and if I prefer dogs over cats. Give me a break.


So, it’s simple: drug test if you are required to, skill test if you have a specific set of requirements and a large number of candidates, and don’t bother with psychological testing. Instead of most of these, trust your instincts, ask tough questions, and listen, listen, listen.

Death of the Job Reference

It passed away sometime early this century, and only a few really noticed. I’m talking, of course, of the passing of the meaningful job reference.

Sometime in the last few years, managers and even friends stopped giving quality feedback. Now, even though people are expected to list references on their resumè, no one checks them. Ever.

I’ve been asked to be put on probably 30 resumès in the last five years. I’ve received zero (0) calls. None. No one even bothers anymore. But, OK, I’ll be honest, that’s why I also never say “no” when someone asks to use me as a reference. It’s a no-cost, goodwill thing.

Of course we all know why: [insert standard answer here] the lawyers. People are so litigious that managers are afraid to give anything but perfect comments about anyone. As the story goes: whisper one bad word (“well they did show up late that one time”), and you’ll get sued so fast it will make your head spin.

If you can’t say anything nice, don’t say anything at all

But there’s something else to it. I think people blame the lawyers, but really it’s more our politically-correct, no-conflict world we live in today. People don’t want to make waves, or say something negative that might get back to the person. After all, they may need a reference sometime in the future themselves. So as their mother said: if you can’t say anything nice, don’t say anything at all. And they blame the lawyers.

I mourn this passing. But do I give meaningful feedback when I am called? Yep, pretty much, I do. Not enough to get me sued, but I do give enough clues to make the picture emerge. And I have my lawyer on speed-dial.