AI Explorables: big ideas in machine learning, simply explained

daenz · on July 5, 2021

Nobody is actually smarter or better than anyone at anything. It's a result of biases and privilege. And if all biases were taken into consideration, everyone would (and should) score exactly the same on everything, regardless of age, sex, race, gender, ethnicity, orientation, etc etc.

That's the underlying message about ML initiatives that touch sensitive topics. Yes, we should be very careful not to encode biases into our models, and these biases are definitely real. Absolutely 100%! But what happens if the models still show disparity? Are we prepared for that possibility? Does it mean there are more biases yet to uncover and we must root them out? Or should we hide anything that reveals disparity in the absence of biases? Or are all biases impossible to remove, so they must forever remain the explanation for all disparity? Is disparity a proof of bias? Because if that is true, then the only conclusion is that every person is exactly functionally equal and the only differences are human biases.

I've seen smart tech people unable to say that men are on average taller than women because (I believe) they were afraid to say that some "favorable" outcome (tall) was a result of some component that could be genetic. Their argument was anecdotes of tall women, and saying there is no such thing as an "average" man or "average" woman.

mustafa_pasi · on July 5, 2021

That is not my understanding of the ethical dilemma. People are different, and belonging to groups is highly predictive, and that is what ML algorithms depend upon to classify you (every feature is a group). The ethical dilemma is, should you be doomed to be assessed based on your belonging to an arbitrary set of groups? It is highly predictive, but also by definition, discriminatory.

Business entities benefit, because on average the predictions are right, but individuals are being discriminated against, and btw, besides the protected categories, other types of discrimination are also unfair.

biasedbrain · on July 5, 2021

You should be allowed to think and perceive and notice things, and use tools to help you do so. It is ridiculous to mandate algorithms should be modified to hide statistical facts about the world.

by all means, educate people that just because some population is perhaps on average more criminal, it doesn't imply any individual belonging to that population is also more criminal.

To hide the statistics would also hurt efforts to find solutions. That can not be a good thing.

Edit: if you don't like the criminal example, what about noticing some populations are on average poorer than others? Would that be a good thing, because government money could be diverted their way, or a bad thing, because banks would be less inclined to lend them money, and landlords would be less likely to rent them a home?

mustafa_pasi · on July 5, 2021

The point of contention is not whether ugly statistical data should be hidden away. That is a stupid conspiracy theory. Sociologists have no problem publishing all kinds of ugly statistical data. The problems start when for profit entities or governments, start deciding to discriminate against individuals because they belong to certain groups. Statistical data applied to individuals is always going to result in injustice. The thought of the economy at large, or the government reducing an individual to a few datapoints and deciding their fate based on group statistics is just about the scariest communist dystopian nightmare, as far as I am concerned. It robs people of their own humanity.

chroem- · on July 5, 2021

You can't just call everything you don't like a conspiracy theory. I have seen plenty of people on HN arguing for censorship of unflattering statistics to achieve social justice goals. Why not just work on fixing the causes of the unflattering statistics in real life, rather than inventing a new framework to make ML models provide less accurate predictions? How do you know these manual override techniques aren't introducing inequity somewhere else in the model? Pareto optimality seems like it should be a very real concern here.

nl · on July 6, 2021

> I have seen plenty of people on HN arguing for censorship of unflattering statistics to achieve social justice goals.

I've never seen this. Can you link some examples?

chroem- · on July 6, 2021

I invite you to spend lots of time digging through search results, which I myself decline to do since these conversations are rarely productive, however this frequently comes up whenever there's a thread discussing declining white demographics in the US. There is usually someone who says we should stop collecting these statistics altogether because it gives fuel to white nationalists. There are more examples beyond that, but those stick out in my mind because they were so blatant.

And while people will surely respond to this post with "Ha, got him! He can't link to a specific example!", I don't believe it's a reasonable expectation of me to catalogue the thousands of comments I read on this website.

nl · on July 6, 2021

I'm interested in those threads (I do a lot of work with demographics), but I've never noticed that.

Searching for "US demographics collection nationalists" finds one result (your comment above - and presumably soon this one): https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

I tried "US data collection nationalists" but that only found one page of results none of which supported your assertion: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

Do you have a better set of search terms?

chroem- · on July 6, 2021

HN's search feature leaves a lot to be desired when it comes to comments. Perhaps some variation of "replacement conspiracy theory".

nl · on July 6, 2021

I've tried "white replacement data" and "white replacement stats" and "white replacement statistics" (without quote marks) with no luck in the last 2 years.

The closest I've found is [1] which is a downvoted comment saying:

> Census data indicate White population has shrunk since 2010

Why is this the first bullet point? Why a focus on White people? Why do you care HN?

The only other place I see stuff like this is when it’s used as an alt-right talking point, or by white-supremacist mass-shooters who write about ‘The Great Replacement’ [1]

So that isn't even arguing against collection of stats. But it's the closest I've been able to find.

Perhaps you are misremembering?

[1] https://news.ycombinator.com/item?id=23735866

chroem- · on July 6, 2021

Another extremely unsatisfying possibility is that those threads could have been flagged as flame wars and de-indexed. I'm not entirely sure what happened, but I do distinctly remember reading those posts and I'm willing to stand by that.

joshuamorton · on July 6, 2021

Fwiw, I couldn't find anything resembling what you're saying in the last 2 years under "great replacement" and similar terms.

biasedbrain · on July 6, 2021

but surely you want to make sure every PoC is individually compensated for the systemic racism they had to endure? How is that different?

I am not opposed to scrutinizing decision making. But that should happen at the human level, not by scrubbing information from the data or output of algorithms.

Some bias is also necessary for survival. When I am walking home at night, I am more likely to change the side of the street when a group of men is approaching me, than when a group of women is approaching me. You can tell me that is discrimination all you want, but I value my life higher than your ideals of political correctness.

nightski · on July 6, 2021

I'm surprised to see your comment down-voted. I think that you precisely summarized the problem. Applying population statistics to individuals is problematic.

However to expand on that I think it is primarily a problem with point statistics. If a Bayesian philosophy was followed, it's about looking at the plausibility of the entire space and realizing an individual can fall anywhere with a non-zero plausibility (or even in some cases an individual can even fall into an area of 0 plausibility given that it is a model and not reality).

biasedbrain · on July 6, 2021

Applying statistics to individuals is not automatically bad. You may be interested if "your group" has inflated risks for certain cancers, for example.

Jack000 · on July 5, 2021

It shouldn't be controversial to say that on average, men are taller than women. However it is a problem to say that "since you are a woman, I infer that you must be short". The difference is (not really) subtle but the first statement can be true while the latter statement is obviously fallacious.

With an objective quantity like height it's easy to see the error, but harder with less well-defined quantities like intelligence, character, etc..

It's frustrating to see people talk past each other with ill-defined terms.

daenz · on July 5, 2021

>It shouldn't be controversial to say that on average, men are taller than women. However it is a problem to say that "since you are a woman, I infer that you must be short". The difference is (not really) subtle but the first statement can be true while the latter statement is obviously fallacious.

I agree, but the person I was speaking about specifically was the former, not the latter. They were extremely resistant to the idea of measurable biological differences. This is an intelligent, science-believing person.

I think there are a few main groups that people fall into around these kinds of subjects:

1) People who suspect that there may be differences in groups of people, but know that acknowledging them will probably lead to arguments that support sexism, racism, homophobia, transphobia, etc (because they have historically), and so they avoid acknowledging any differences that can't be explained by bias.

2) People who want there to be differences in people so that they can justify their biases against groups of people.

3) People who don't want there to be differences in people, because it doesn't align with a fair worldview that they want to be real.

4) Everyone else who just want people to be honest about potentially uncomfortable truths.

nerdponx · on July 6, 2021

The so-called "uncomfortable truths" might just be "uncomfortable" for you right now, but if you end up on the wrong statistical end of one of them, your life could be ruined.

There are very, very practical problems here. It's not about nebulously wanting to avoid enforcing stereotypes. It's about avoiding very real and very negative outcomes for unsuspecting and innocent individuals.

It's the same exact reason why "racial profiling" by the police is unequivocally and universally considered unjust and bad, except by racists who like to quote statistics as a kind of pragmatist veneer.

What's worse is that you could end up racially profiling people without even meaning to, either by omission (not enough people who look like X in the data) or by technical limitations (apparently facial recognition algorithms struggle with dark skin even on balanced datasets). There is a lot of very interesting and important work to be done in this area.

joshuamorton · on July 6, 2021

I'm going to steelman the opposing argument while keeping the presumption of a biological sex-binary (i.e. we'll ignore complications like chromosomal abnormalities).

This gets complicated when you start to differentiate between hormones and genes, as might happen when you involve trans people. My understanding (and a cursory search seems to back this up) is that pre-puberty, height isn't different based on chromosomes. Its only post puberty that this difference occurs (and of course there's points in time where in aggregate, girls are taller than boys, as they hit puberty earlier).

So your claim is that there are chromosomal height differences. But assuming that there are enough trans people in the population, and specifically trans people who had hormone therapy through their pubescence, self-identified gender might be a better predictor, because you'll have someone with XX chromosomes who identifies as a man and had a male puberty in terms of hormone levels.

This get's very subtle, because then "men are taller than women" remains true, while "measurable biological differences" may be false, or at least...less true. Yet you presented these, offhandedly, as equivalent statements. I don't think rejecting that premise (or at least, not taking it as a given) is fundamentally anti-science.

dkshdkjshdk · on July 6, 2021

I understand what you're trying to say but, in this situation you are describing, the "good predictor" is not really your self-identified gender, but whether you were subjected or not to a "male-like" hormone profile during your puberty (either due to endogenous processes or through exogenous means).

A XX person that self-identifies as male and hasn't been subjected to hormone treatments is less likely to display male-like traits (incluiding the hypothetical "higher height" we are discussing about) than a XX person that identifies as female and has been subjected to a "male-like" hormone profile by exogenous means.

Finally, just because "measurable biological differences" may be mediated/caused by differences in hormone levels, rather than differences in genes, doesn't make them any "less true".

joshuamorton · on July 6, 2021

Overall, I agree with your first two paragraphs (I was letting "and specifically trans people who had hormone therapy through their pubescence" do a lot of work).

> Finally, just because "measurable biological differences" may be mediated/caused by differences in hormone levels, rather than differences in genes, doesn't make them any "less true".

Sure, but my point is that if exogenous hormonal puberty becomes common enough (like, 50% of the population opts in to them), you can end up in a situation where in aggregate there are no measurable differences. You have equal numbers of XX women and XX men, and XY women and XY men. And all the men and all the women have equivalent heights, irrespective of their genes. In such a situation genetics cease to be predictive of height, while gender remains predictive.

I acknowledge that such a situation is unlikely, but it should still cause one to second guess the "biological" nature of the differences.

I think there's another interpretation of what you're saying which is that given two XX-chromosomal people, one who had a male-like puberty and one who had a female-like puberty, they present biological differences. I'm not sure if that stance would be widely accepted, but let's assume for a second its true.

Then I can construct sort of the opposite of the prior situation: a society where a huge number of people transitioned post-puberty, and so now identify as the opposite of the gender they went through puberty as.

If we look at biology, there's a clear mapping from chromosomes to height. If we look at gender, there's no correlation (there are tall men and short men, and tall women and short women, in approximately equal numbers and so you end up with two bimodal distributions). In this situation, we don't really have measurable aggregate biological differences between the men and women.

This is really the point I'm getting at, talking about biological differences between men and women get's very complex very quickly assuming you want to consider trans people. And really complex may be the wrong word, "completely not useful" is perhaps better, because (at least usually) biological categorization comes from genes.

We end up with 3 possible predictors for height: genes, hormonal-gender-at-puberty, and gender-today. Its not clear to me, when talking about biological differences, which of these three predictors we want to use, or even if we want to use biological differences, and we were seemingly assuming we'd use only one of them.

dkshdkjshdk · on July 6, 2021

> Sure, but my point is that if exogenous hormonal puberty becomes common enough (like, 50% of the population opts in to them), you can end up in a situation where in aggregate there are no measurable differences. You have equal numbers of XX women and XX men, and XY women and XY men. And all the men and all the women have equivalent heights, irrespective of their genes. In such a situation genetics cease to be predictive of height, while gender remains predictive.

This is unlikely, as you say, but in such an hypothetical situation, where half of the male population is being subjected to feminizing hormone treatments while half of the female population is being subjected to masculinizing hormone treatments, sure... you would be (artificially) reducing the difference in heights between XY and XX people. But notice that, again, the deciding factor is not at the psychosociological layer ("self-identified gender"), but still at the biological layer: what matters is whether you are genetically/chromossomally male and/or subjected to male-like hormonal profiles, and not so much whether you identify as a male.

> I acknowledge that such a situation is unlikely, but it should still cause one to second guess the "biological" nature of the differences.

If the differences are literally mediated by a combination of genetic and endocrine effects, then how could it be anything other than due to "biological" factors? Sure... if you identify as male (while not being a XY person), you're more likely to subject yourself to hormone treatments that lead to more male-like traits... the correlation is there... but the main factor, in the end, is whether or not you subject yourself to those treatments (and not so much how you identify as).

> Then I can construct sort of the opposite of the prior situation: a society where a huge number of people transitioned post-puberty, and so now identify as the opposite of the gender they went through puberty as.

This is, again, quite hypothetical. But, even in such as case, I would assume that people's physical traits would still be generally more reflective of one's genes and hormones, rather than how one identifies as or the social role they choose to perform as. Biological factors (including genetic and endocrine) would generally be more determining that psychological or sociological factors ("how I feel" or "how others perceive me").

> If we look at biology, there's a clear mapping from chromosomes to height.

There really isn't (at least not a "clear" one). You can have really good genes and still end up short due to nutrition (e.g., you're not getting enough protein and calcium in your diet) and/or epigenetic effects (e.g., your parents smoke and/or were subjected to malnourishment for a while at some point in their lives).

> If we look at gender, there's no correlation (there are tall men and short men, and tall women and short women, in approximately equal numbers and so you end up with two bimodal distributions).

This is, again, quite hypothetical and unlikely (that you end up with two bimodal distributions with no differences whatsoever in terms of mean or median-shift). Either way, it just reinforces what I've been saying: your gender (i.e. how you identify and/or how society sees you) is probably not a very important factor in determining height (once you account for confounders such as "hormonal treatments").

> In this situation, we don't really have measurable aggregate biological differences between the men and women.

Sure, in an hypothetical situation where you're giving feminizing hormones to half the men (to make them more woman-like) and giving masculinizing hormones to half the women (to make them more man-like), you could perhaps reduce differences between the two subpopulations (though not entirely... hormones don't control everything after all). But, notice that, again, you are using a purely "biological" process to achieve this and, even though you seem to be going out of your way to come up with a situation where there are no "measurable aggregate biological differences", I'm pretty sure that you would still find "aggregate biological differences" because there are non-endocrine genetic factors that affect height.

> This is really the point I'm getting at, talking about biological differences between men and women get's very complex very quickly assuming you want to consider trans people.

I don't want to sound insensitive, but if someone says "men are generally taller than women", you can probably infer from the context that they are most likely using the terms "men" and "women" to refer to biological men and women (i.e. not just someone who identifies as "man" or "woman", but someone who has the genetic and/or endocrine profile of a "man" or "woman").

To clarify, I think a "trans woman" is a "woman" and a "trans man" is a "man". But, even with hormone treatments, there are clear measurable biological differences between a XX person and a XY person (if nothing else, XY people have a shorter chromossome).

Furthermore, this overall discussion glosses over intersex people... there are plenty of people out there that are neither exactly male nor female, biologically.

Imagine there is a miracle drug that cures cancer, but it has a 95% mortality risk for men and 0.1% mortality risk for women. Should I treat a certain person according to how they identify on a psychological and social level, or according to their biological properties?

> We end up with 3 possible predictors for height: genes, hormonal-gender-at-puberty, and gender-today. Its not clear to me, when talking about biological differences, which of these three predictors we want to use, or even if we want to use biological differences, and we were seemingly assuming we'd use only one of them.

We probably want to use the predictors that are more predictive. And, as I mentioned, a person's chromossomal sex and what profile of hormones they are and were subjected to (along with environmental, nutritional, epigenetic, etc. factors), in the end, is much more determining (and, thus, a better predictor) than how a certain person identifies (or identified) themselves on a psychological or sociological level.

In the end, and as always, I think much of the disagreement has to do with the fact that people use terms that, without the proper context, could be misinterpreted (like "woman", that can be used to mean "XX person", "phenotypically female", "identifies as female", and I'm sure many other things).

jimmygrapes · on July 5, 2021

Where are all the #4s?

mhuffman · on July 6, 2021

>Where are all the #4s?

Everywhere! What is missing from the #4 sentence is "... and brave or reckless enough to actually discuss any of those things in public when 'wrong-think' can possibly lose you your job, social connections, and who knows what else going forward, since everything on the Internet is permanent now".

tbrownaw · on July 6, 2021

> However it is a problem to say that "since you are a woman, I infer that you must be short".

Sure, replacing "much more likely" with "definitely" is inaccurate.

And if you're assuming that whoever's saying this has more data available - like actually meeting wherever they're talking about - then well discarding information is also silly.

j7ake · on July 5, 2021

It would be nice to do a well-designed experiment to ask what are the biological differences in different categories of humans (eg old vs young). Then use that ground truth to assess whether an ML model is generating the biases that deviate from the ground truth. That way there is no a priori assumption of what is technical bias and what is real signal.

With the level of investment people are putting into ML technologies, these types of ground truth calibrations would cost only a fraction of the total budget.

nxpnsv · on July 5, 2021

The political debate is endless and not very enlightening. However, this stuff max constitute useful tools to make better models in a non political way. A model that is biased because of too narrow test data will perform worse than a model trained on a suitable dataset. It can be hard to spot these issues, and having some checks and bounds to avoid common sources of bias is a good thing. Furthermore, an analysis might expose more info about the users than intended. This can have pretty bad consequences. For this purpose I thought the one about "collecting sensitive information" was a pretty good start.

3grdlurker · on July 5, 2021

> But what happens if the models still show disparity?

You have yet to qualify exactly what you mean by disparity here—it doesn’t help that the example you used (i.e gender-height correlation) isn’t an ML model.

tomrod · on July 6, 2021

I agree with your main point, but note that a gender-height correlation is effectively the basis of a linear regression, the most basic of all ML models.

dekhn · on July 7, 2021

I think it's more likely heritable attributes follow a normal distribution, rather than have a variance of 0. I wouldn't expect everybody to score the same- there seems to be (biologically speaking) a process which leads to variable levels of attainability of intelligence. Anybody who actively denies that men and women have different height distributions is just being non-scientific to avoid pissing off somebody.

thefz · on July 6, 2021

> Nobody is actually smarter or better than anyone at anything. It's a result of biases and privilege. And if all biases were taken into consideration, everyone would (and should) score exactly the same on everything, regardless of age, sex, race, gender, ethnicity, orientation, etc etc.

I wholeheartedly disagree. You cannot wipe away genetic predisposition and natural talent like that.

DoreenMichele · on July 6, 2021

I've seen smart tech people unable to say that men are on average taller than women because (I believe) they were afraid to say that some "favorable" outcome (tall) was a result of some component that could be genetic.

I've taken some pertinent college classes, like Intro to Psychology and Social Psychology. These are really hard things to debate in good faith because people look around them at the world, see the reality that men are, on average, taller than women on average and assume genetics when we don't actually know the reason.

So if you have good faith reasons to wonder put loud or speculate about such things, you are likely to get treated dismissively and not taken seriously.

There is a huge component of "them that has, gets" and it's really hard to sort out exactly why x person got y outcome. If you are interested in social justice -- or even just making your own life work when you are the "wrong" demographic -- then these are really frustrating discussions to try to have. It may be some hypothetical argument for someone whose life works well, but it's often a desperate attempt to get a fair shake of some sort by marginalized people.

So there's a lot to unpack there and it's hard to unpack it and there are legitimate reasons why some people feel strongly compelled to not go along with "the obvious answer" that "everyone" knows to be true, even though they may not know how to effectively argue against it.

lambdaphagy · on July 6, 2021

In surveys of hundreds of human cultures, I don't believe anthropologists have found a single one in which women are, on average, even as tall as men. Though average height differs considerably between groups, there appears to be a consistent bias favoring men (see Fig 1. https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1558-5646....).

We also know that although secular changes in diet can shift population averages over time, the heritability of height is pretty high: most estimates end up at about 0.8. If your parents were short, getting adopted into a tall family is not going to help you become taller.

Height is a highly polygenic trait and no single gene controls more than a few percent of the variance. When people say that we don't know the "genes for height", that's what they mean, not that the jury is still out on whether height has any genetic basis at all.

DoreenMichele · on July 6, 2021

I have a genetic disorder. So I have some familiarity with the subject of genetics.

Years ago, I read an article in a magazine while waiting to be seen by a doctor. It talked about a study in a culture that valued boy children over girls.

Boys, especially oldest boys, got treated consistently better than girls. They were more likely to be fed their favorite foods. If they didn't feel well, they saw a doctor now whereas there was more often a "wait and see attitude" with girls. This meant boys were more likely to see a doctor several hours sooner, such as in the evening rather than waiting until the following morning.

This was not a case of girls being treated abusively. It was just a case of boys being somewhat pampered.

The difference in outcomes could be measured in terms of mortality rates.

We don't know what all genes contribute to height difference. We do know that underfeeding people will stunt them. Girls around the world seem to constantly be on a diet for fear of being too fat.

We don't know how much female height is being suppressed by cultural factors and we also don't know if cultural factors suppressing height might have intergenerational impact, even influencing genes by gender.

I don't think it's unreasonable to wonder about such things. I know for a fact it is enormously hard to have a meaningful discussion about such things on the internet, no matter how scientific the audience in question imagines they are. People are quick to dismiss such suggestions.

I've seen ridiculously biased statements about gender on HN upvoted to the top position in a discussion and heartily agreed with. It rapidly becomes a hostile environment where it doesn't feel safe or in any way productive to point out glaring flaws in such remarks.

biasedbrain · on July 6, 2021

You can compare women doing competitive sports with men doing the same. They will both have an optimized diet and so on.

Perhaps you could get some women to a similar level as men by giving them hormones from an early age (I don't know), but even then I think it would be a stretch to say "see, it is all cultural, we are denying those girls their hormones".

lambdaphagy · on July 6, 2021

What prediction would this theory make for Sweden?

nl · on July 6, 2021

> Nobody is actually smarter or better than anyone at anything.. <snip> That's the underlying message about ML initiatives that touch sensitive topics.

No it's not, and if you think that you are misreading it.

The general point of ML ethics is that a machine learning system should not reinforce bias caused by historically biased training data. This is good statistical practice, but unfortunately often gets lost in the "magic AI" bubble.

Avoiding this is like saying "a salary of $110K/year makes you a rich person". It's sort of based on something true (it does put you in the the top quartile of US earners), but is completely useless without correcting at least for location and number of dependants.

That's what ML Ethics is about.

> I've seen smart tech people unable to say that men are on average taller than women because (I believe) they were afraid to say that some "favorable" outcome (tall) was a result of some component that could be genetic.

So you've taken that dumb argument and said all ML ethics arguments are the same?

tbrownaw · on July 6, 2021

> The general point of ML ethics is that a machine learning system should not reinforce bias caused by historically biased training data.

When I think of "biased training data", I think of data that systematically misrepresents reality. Like say taking the prevalence of news reports as a proxy for the prevalence of what's being reported on.

In contrast, the AI ethics people seem to use that term to mean data that doesn't reflect just (as opposed to unjust) causality.

I think there's a deeper issue, which is summed up in the saying "when a measure becomes a target, it ceases to be a good measure": simple correlation-finders - which is what current ML/AI models are - are not fit for the purpose of building control loops. And the "ethics" people are trying to paper this over by manually fixing up all the correlations that people notice aren't (or shouldn't be) casual.

nl · on July 6, 2021

The post linked to for this story gives the example of searching for "CEO" only showing white men.

The reason for that is almost entirely the same as your example of biased training data ("taking the prevalence of news reports as a proxy for the prevalence of what's being reported on").

CEOs from large US companies get reported on the most in English language press. Therefore searching in English for "CEO" finds those people.

tbrownaw · on July 6, 2021

« The post linked to for this story gives the example of searching for "CEO" only showing white men.

The reason for that is almost entirely the same as your example of biased training data ("taking the prevalence of news reports as a proxy for the prevalence of what's being reported on"). »

But the CEO essay / example isn't about making search results reflect whatever the real distribution is. It's about making them match arbitrarily chosen statistics explicitly in an attempt to influence how people respond to it.

.

From the essay: "However, these datasets reflect the biases of the society in which they were created and the systems risk re-entrenching those biases.". That's not talking about trying to correct for systematic error a dataset that doesn't reflect what CEOs actually look like, that's talking about CEOs actually not looking like what they should look like; and about manipulating results in support of the social goal of changing what is to be closer to what they feel should be.

Which isn't even really about AI. It's about the proposed feedback loop (CEO demographics) -> (what people think of when they hear "CEO") -> (what kids want to be when they grow up) -> (CEO demographics) and deliberately manipulating that first step in pursuit of a specific social goal. The use of AI is incidental.

biasedbrain · on July 6, 2021

Arguably, in your example showing the most prominent CEOs is the correct response. It is ideology to claim "women are only not CEOs because they have no role models, therefore we should make our AI lie about female CEOs to normalize the idea, so that more women will become CEOs".

Actually I think there are lots of articles about female CEOs, as female CEOs will be more likely to be written about than male CEOs ("the top female CEOs fo 2021" and so on, there are bound to be countless such articles).

trhway · on July 6, 2021

>Nobody is actually smarter or better than anyone at anything. It's a result of biases and privilege.

that statement is just incompatible with the basics of natural selection.

spoonjim · on July 5, 2021

Yes, it’s a common “argumentation“ strategy on Twitter. “Do you know that someone was once born with a horse’s head? Clearly the concept of ‘human’ does not exist!”

NicoJuicy · on July 5, 2021

I usually mention that nature experiments more with men than with woman.

That's an explanation for why top talent are usually men, but the lesser ones are usually men too.

( + I was it was an explanation of a study)

rocknor · on July 5, 2021

No need to spread FUD here. What you're saying isn't really happening and is not a problem. Mountains are being made out of molehills to spread FUD and derail actual useful discussion for the purposes of making a political point. Show us some proof and then we can talk.

joe_the_user · on July 5, 2021

Nobody is actually smarter or better than anyone at anything. It's a result of biases and privilege.

Where is the linked article claiming this? Your post seems like complete derail of the article discussion for the purposes of making broader political points. [insert hn caveat about being more nuanced in potentially more flamebait topic]

phreeza · on July 5, 2021

> And if all biases were taken into consideration, everyone would (and should) score exactly the same on everything, regardless of age, sex, race, gender, ethnicity, orientation, etc etc.

"Should" is the operational word here. It is a normative decision that these (protected in many jurisdictions) classes should not receive any different treatment due to membership in the class. I agree with this goal, it seems you do not? Or are you objecting to a normative decision being disguised as a descriptive one? In that case I think it is largely a straw man, it's not how most people actually think.

biasedbrain · on July 5, 2021

These are now the "big ideas" in machine learning? I guess they have given up on AGI then?

In my book, this AI diversity nonsense is not even AI research. The SJWs pretend they discovered the concept of bias, when it has always been a core part of Machine Learning from the beginning.

Even more concerning, it is an attempt to permanently encode their distorted sense of reality in AI models that affect billions of people. This needs to be fought, not celebrated.

jensensbutton · on July 6, 2021

What are you mad at, exactly? That people want to minimize the number of mistakes ML systems make due to skewed training data? Or that people don't want known problems to be baked into decision making systems that will impact real people?

Who are the SJWs here? Other people with PhDs that have different opinions from you? Did they ruin your research? Did they make OpenAI stop what they're doing?

Are you sure it's not your sense of reality that's distorted?

biasedbrain · on July 6, 2021

As I said, these SJWs and "AI ethics" researchers have not invented the idea of minimizing the number of mistakes ML systems make.

Yes, you can point out that perhaps a data set is missing some black faces, but that should be a minor bug entry in the bug tracker, not a million dollar industry getting people PhDs.

And "those people" are not stopping at providing better data sets, they want to introduce their own political bias into the system, literally creating a distorted view of the world. As I said, this is serious stuff, if you assume most people will perceive the world via Google searches.

Maybe the issue here is "known problems" - this is not about "known problems", it is about a specific ideology that makes them believe they "know" what causes certain problems. As a simplified example, they want to make the AI "not see color" and think that will make the problems go away. Forcing the AI (and in turn the people who use it) to not see color is totalitarian control and wrong.

Decision making is an issue, but that needs to be addressed at the human level, not in the tools that are being used to perceive the world.

If you AI discovers that women historically have been worse hires for your company (or people doing the hiring thought they were worse), the correct response is not to introduce some artificial bias in the ML to ignore gender or inflate the scores for women. It is to try to figure out what has been going on, and to make up your mind what you actually want.

civilized · on July 6, 2021

> As a simplified example, they want to make the AI "not see color" and think that will make the problems go away.

I promise you that is the exact opposite of what they want.

biasedbrain · on July 6, 2021

True, in a way, as they want to artificially inflate those scores. As I said, it was a simplified example. The bottom line is they want to hardcode their politics into the algorithm.

pepemysurprised · on July 6, 2021

This nonsense doesn't even have to do anything with AI, it's just PR and a response to the current cultural environment.

jijji · on July 6, 2021

This distorted sense of reality is already visible in Google's algorithms. Search for "men can" or "women can" on google home page, only to find "men can have periods", "men can get pregnant". Rather than have ML models trained on data that respresents the reality of something, the data is changed or the model is changed to spit out this distorted reality that doesnt exist in the model or data...

nl · on July 6, 2021

This is a weird claim!

I searched for "men can". Eventually, on the fourth page of results I found an article from healthline[1] about periodic hormonal changes in men that does appear to have similar effects to the female period.

This doesn't appear to be any form of bias or correction of bias or anything other than a highly SEOed and indexed page the is shown in response to a very vague query that matches the title of this page.

[1] https://www.healthline.com/health/do-men-have-periods

jijji · on July 6, 2021

I am not talking about search results, rather Google Instant results [0] ....

[0] https://m.imgur.com/a/zMI6pTl

throwaway789257 · on July 5, 2021

I clicked expected to see big ML ideas explained, and all I found was Google scrubbing its AI ethics reputation after the debacle last year. I hope they devote their energies to actually explaining some ideas about ML rather than nattering on about fairness.

sharikous · on July 5, 2021

God know how nice explanations of ideas in ML would be useful.

But this is ML enforcement of made up "social values", like diversity and fairness

nerdponx · on July 6, 2021

God forbid I care about other people!

tomrod · on July 6, 2021

/director hat engaged

I'm really surprised that most of the top level comments are so negative.

This is a huge discussion in using ML for policy. In five to ten years time you'll have a hard time using ML in business without having the concepts these items address.

Will you ML model result in incidental bias to people with criminal records? The disabled? The elderly and other protected categories?

Does your insurance company require certain fairness metrics be deployed to maintain your product or service policy coverage?

Will your company pursue tax incentives that focus on certain definitions of unbiasedness?

Developers, data scientists, and statisticians have had a good run implementing loads of statistical models without considering the negative implications/externalities of automated decision rules and paths to KPIs. We will likely see this required in the future.

It will be hard. But, it will be hard to automate away for the foreseeable future (Github Copilot made a serious POC impact for standard dev work!).

/throws director hat back onto the hat rack, back to data science for me

biasedbrain · on July 6, 2021

If an elderly person will cost your insurance more than a younger person, the ML should tell you so. Period.

If by policy you should be forced to still give them the same conditions, that should be decided on a different level, not at the level of the tool that analyzes your data.

nerdponx · on July 6, 2021

> I'm really surprised that most of the top level comments are so negative.

Lots of throwaway accounts here, and the comments are clearly written by people who aren't professionals in this field. Just downvote them and move on.

> Does your insurance company require certain fairness metrics be deployed to maintain your product or service policy coverage? Will your company pursue tax incentives that focus on certain definitions of unbiasedness?

These are scary thoughts. If this kind of policy does come to pass, I fear it would backfire significantly, basically making any machine learning (let alone "AI") a serious legal liability. Imagine world in which companies no longer want to hire data scientists of the regulatory risk of even having one on staff!

There's an argument to be made that a bit of ML liability is a good thing, that important decisions shouldn't be made by machine. I'm tempted to agree in most cases. But encoding it into regulation will be extremely difficult, even if you have truly pure and noble intentions, without making a giant mess. And that's before you realize that, obviously, major "AI powers" like Google, Facebook, etc. could have been working on this for years, and as soon as it's ready for production they will start lobbying aggressively for it, in order to further monopolize the ML/AI field.

Also, do the machines actually perform worse (i.e. "more biased") than underpaid humans who are prone to fatique, mistakes, and laziness?

dekhn · on July 7, 2021

How is anything you just described different, in any way, from the existing situation in risk assessment and fairness in insurance? IE, everything we say about ML, the insurance business has already been doing for a century using statistics. Their models exist for two things: to increase the likelihood of profit (due to properly determining rates based on risk) and staying compliant (if there are laws that ensure some level of fairness).

Frankly I don't see any real difference from long-established principles in the insurance space.

antonzabirko · on July 5, 2021

Cool idea but terrible topics

pepemysurprised · on July 6, 2021

AI Explorables: Big ideas in PR to make Google look like a good-willed company.

This only exists because someone needed to get promoted for working on "diversity efforts"

Graffur · on July 5, 2021

In short: garbage in, garbage out

azinman2 · on July 5, 2021

That’s actually not a good summary at all. You could have quality in, but apply to a different group than was trained on, and get garbage out (eg face recognition trained on one set of races or genders, then try to apply to different races to genders). Or you train on data that’s high quality but is the result of inequity (eg police profiling, or credit worthiness), and then you get inequity out of the model.

nerdponx · on July 6, 2021

I think it's fair to say that one component of AI ethics is identifying these subtle forms of "garbage out" and the subtle forms of "garbage in" that cause them.

azinman2 · on July 6, 2021

Garbage in means you have too much noise over signal coming in. You might be able to call current society garbage if you wish, but from an ML perspective it’s not accurate to say arbitrary input is garbage if it indeed is high quality data that reflects society as it is, or if you only have data on a certain set of classes but what you have is noiseless.

nerdponx · on July 6, 2021

If you are studying criminality and using conviction record data, I think it's fair to say that the biased introduced by biased policing, biased laws, etc. is a form of "garbage-ness" in the data, in that it might not reflect the reality that you were trying to model.

Graffur · on July 6, 2021

Biased data is garbage if your aim is to have unbiased results...

fumeux_fume · on July 6, 2021

It's a trap