More on Dota 2

lvoudour · on Aug 17, 2017

I know it has been mentioned a lot the past few days, but since the articles keep flowing about it I'll mention it again:

It's a great feat and kudos to the openai team, but it is VERY unfair for the human players who rely on a sensory interface vs a direct API connection. That's unlike chess or go where the interface isn't important. The really impressive feat will be an AI that uses the same sensory information to make decisions (and I really hope that's where the openai will head next)

BaronSamedi · on Aug 17, 2017

I completely disagree. An API is software's natural input mechanism just like the senses are a human's natural input mechanism. Having the AI use human senses unfairly handicaps it. More importantly, however, is that this is not the key problem.

The key problem is teaching the AI strategy and tactics. What heroes to pick? Where to lane them? When to rotate? What items to buy? What spells to level up? What enemies to target with which spells and in which order? These are the hard problems and they are very hard indeed. A 5v5 AI will have to become expert at risk calculation, Pareto optimization, basic military principles, and many more things. Compared to these problems, the choice of input mechanism is trivial.

_dps · on Aug 17, 2017

I'm almost entirely ignorant of DOTA 2, but the coverage suggests to me that it is winning mainly through what I would call "micro" excellence (firing at just the right range, careful maneuvering, and so on). It's clear that this kind of precision control is extremely advantaged by having direct access to the state of the game, rather than having to estimate it in real-time from vision data.

So, to me, based purely on the news coverage, it's not clear that it has learned anything like "superhuman" levels of strategy. We already know that computers have superhuman reaction times and precision calculation abilities, so it seems to me the interesting question is whether an advantage would remain after factoring those out.

skgoa · on Aug 17, 2017

As someone who plays dota, this bot really isn't that impressive. The hero they chose is seen as a very difficult one to master for humans, precisely because judging distances, current life, current mana, damage etc. is so difficult to do on the fly and even if you can keep track of all of that in your head, you need extremely precise inputs to outplay your opponent. Yet by going through the API, they handed all of that to their bot on a silver platter. They pretty much let the bot sidestep the core challenge of this hero, while it was kept in place for the human players.

On top of that they also reduced the complexity of the game quite significantly by limiting items etc., which further reduced what humans could do against the bot. Even then, the bot utterly failed once humans were allowed to use a tiny bit of creativity.

And that's not even taking into account that this was not even close to the complexity of a real dota match. The big challenge in dota is in the decision making with incomplete information and in coordinating 5 people with only voice and the ability to ping the mini map, in a giant "search space" created by hundreds of different heroes, items and game mechanic interactions.

Bartweiss · on Aug 17, 2017

Interesting, thanks.

Several years ago, League of Legends released an upgraded suite of bot characters. One of them, Cassiopeia, had to be turned down enormously from her best play to make her viable. She could beat many of the game's devs (mid-tier hobbyists) and was a non-zero threat to professional players. This was, to my knowledge, achieved with little or no machine learning at all.

The defining traits were similar to what we see here. She had area of effect spells with casting delays, meaning that the ability to precisely evaluate how other players could move was crucial. And she had a spell which refreshed based on the effect of those AOEs, meaning that millisecond precision was a major source of her ability to deal damage. And her ultimate was an exceedingly touchy and unpredictable disable based on the angle opponents were facing (in a game with instantaneous turning). Even top-tier pro players regularly lost its effect because of latency or judgement issues.

The results, by all accounts, were terrifying. She was barely competent strategically, but as long as she could afford items (and admittedly, the LoL bots don't need to farm) she could win all of her tactical fights simply by inhuman precision.

The OpenAI project is more admirable than that. It uses real farm, makes item purchasing decisions, and apparently has a rate-limited API. (That last seems especially important.) But I still wonder how much of the bot capabilities are derived simply from inhuman accuracy.

mabramo · on Aug 17, 2017

In terms of Dota, I don't think it's that impressive, but it is still cool. I'll be impressed when I see a 5v5 with bots that adapt to the opponents' strategy. I do think that it currently could be an excellent tool for mid-laners and cores to practice. OpenAI is also blowing the door open to Dota AI development and we will soon see bot tournaments. Engineers will develop AI and put them against one another in standard 5v5 matches with a pick/ban phase and all.

In terms of AI, I don't /think/ there's anything groundbreaking here. Correct me if I'm wrong, as I don't follow AI research, but this technology is nothing we haven't already seen. I believe the development of AI for Dota is a publicity move to get people excited about what AI could be for humanity. This might be the way to introduce AI to non-technologists and get people excited about it.

quiteawhile · on Aug 17, 2017

> This might be the way to introduce AI to non-technologists and get people excited about it.

Yeah, I'm almost positive that this exhibition is intended mostly to raise awareness and create this hype. Go and Chess, for most people, are simple games compared to Dota2, so if Elon is worried about AI and want people to be more aware of the threat he perceives it certainly helps to make this big show and get all those impressions with a game that is considered by the majority of people (especially younger) to be more complex/harder than what has been done before.

bhargav · on Aug 17, 2017

Could not have agreed more with my friend here.

The bot would be more impressive if it used vision or buffer to analyze everything. The bot has all of the information which a human does not have. If the human had the same information I think it would be a better competition.

To me it seems that this bot is good only by the fact that is has information and input advantage.

I also want to clarify that this is not an unflavourful feat and that I think it is cool. But I noticed that a lot of members in the Dota community did not know of the bot API which is likely the case here.

naikrovek · on Aug 18, 2017

The bot cannot observe the game more quickly than a human, and can only take action as fast as a human.

Read the whole thing.

hobofan · on Aug 17, 2017

I agree. I don't think there was anything too novel in there, but it is a necessary foundation they need before advancing on to the strategy part, or they would just be out-microed there.

I think there a little bit of strategy already in there since it learned about positioning, and not exposing itself too much (which is of course easier without fog of war). The only thing I was excited to see was the creep blocking, but the chance that this would turn out to be hand-trained was pretty high.

doikor · on Aug 17, 2017

> it is winning mainly through what I would call "micro" excellence

This is partly it. From the interviews with the pros it mostly close to perfect micro, calculating results very well (it knows when it will win the 3 raze spam + 2 autoattacks with 2hp left and thus wins the match etc. for a human that would be hard to calculate) and due to those two it punishes every small mistake the human player makes very hard.

From the actual thinking parts it had to learn creep control (pushing too far just gives free exp/last hits to the other player etc.) and itemization (what counters what, when to get ward etc.)

grimskin · on Aug 17, 2017

> it's not clear that it has learned anything like "superhuman" levels of strategy

But still, it's learned "human" levels of strategy, and that actually amazing. And let's not forget, that this is only beginning, more stuff to come. IIRC - some third-party StarCraft AI actually invented a trick that was later used by human players on pro-level tournaments.

GenKali · on Aug 17, 2017

Indeed - I believe it was a highly successful build order that strayed from the 'norm' at the time. I was quite into SC2 at the time and this was something of a big deal.

graphitezepp · on Aug 17, 2017

It was the 7 roach rush. Used extractor trick which was generally considered a bad idea by humans due to being economically bad as well as a double overlord which is also against general build principles, but it turned out that it smoothed into an insanely efficient early attack.

jasonwatkinspdx · on Aug 17, 2017

> winning mainly through what I would call "micro" excellence

This certainly is an advantage for the bot. But it's also clear the bot understands some sophisticated parts of dota. It understands how to control the creep equilibrium via the aggro mechanics. Skill at this is one of the things that separates pros from casual dota players. In the video footage you can see sumail and rtz are surprised by this aspect of the bot. It also understands how to cancel salves as well as bait with them. That's all pretty impressive for a bot trained up from self play.

lvoudour · on Aug 17, 2017

>Compared to these problems, the choice of input mechanism is trivial

I don't know why you think the visual/aural problem is trivial. The biggest achievement of AIs so far in this field is classification of static images

bingojess · on Aug 17, 2017

Trivial may not be the best word but what they were trying to get the AI to do was learn dota strategy and tactics. It's like teaching alphaGo to play go with a camera.

mirceal · on Aug 17, 2017

I disagree. The representation of state is hard for machines. Go/Chess are very simple state wise when compared to Dota. By giving it API access you have eliminated guesswork/approximations from the overall picture

BaronSamedi · on Aug 17, 2017

I apologize if my statement was misleading. I do not think the visual problem itself is trivial. I think the choice of input mechanism is tangential to the hard problems actually faced by the AI designers: strategy and tactics.

velobro · on Aug 17, 2017

The Dota 2 bot is no different than Aimbots in FPS games. The bot was able to access the api to perfectly target the exact location and instantly cast a spell, much like an Aimbot can instantly get headshots.

graphitezepp · on Aug 17, 2017

It's a good comparison honestly. They stripped away all the complexity of dota to this one single scenario were the things bot's are naturally going to be good at (last hitting, precise distance evaluation) matter over strategy.

marricks · on Aug 17, 2017

I use to think that before I read about embodied cognition [1]. I think a lot of the harder and real challenges come from acting as an agent in an environment with often fuzy information.

Not only do challenges come in the form of input mechanism but potentially many rewards as well in terms of how humans process and off-load information.

While I don't think DotA would be the most important example, I think having a more realistic interface is a good step.

[1] https://en.m.wikipedia.org/wiki/Embodied_cognition artificial intelligence section probably most relevant

gerardnll · on Aug 17, 2017

Real life doesn't have APIs that's why AI applied to robots and cars and whatever have to be able to interpret inputs like humans.

rasz · on Aug 17, 2017

>What items to buy?

hardcoded by a pro dota 2 player hired as a consultant

ManlyBread · on Aug 18, 2017

The items are often bought depending on multitude of factors, there's no one "cookie cutter" build. On top of that the AI should be able to find more optimized buying strategies on it's own.

Houshalter · on Aug 17, 2017

The goal of this project is to make progress in reinforcement learning. Having an AI plan and learn strategies to play a game. This is a separate problem from machine vision, which is being worked on by tons of other people. Adding a machine vision requirement wouldn't help them with their goal and would take 100x more computing resources.

OpenAI could probably afford to do it, but it no smaller researchers or hobbyists would be able to compete.

motoboi · on Aug 17, 2017

You forgot to mention the people working on physical robots, waiting for software to get better.

fridek · on Aug 17, 2017

Exactly my thoughts after Elon Musk's tweet: https://twitter.com/elonmusk/status/896163163581825025

Unless AI is constrained to pro player max pointer move delta, click rate, and vision latency, I don't really see much difference between AI and a team of kids running with aimbot shouting "cyka cyka".

alkonaut · on Aug 17, 2017

With the correct rates and latencies it should be possible to be fair though?

I.e., if an event occurs in the game, it's placed in a queue and the AI "sees" events pop out the other end of that queue, at a minimum latency and at a maximum rate (i.e. if too many things happen at once the AI is overloaded).

After that the AI makes decisions and puts the command in a command queue. The command queue works the same way: commands pop out the other side (to the game) after a minimum latency and at a maximum rate, to simulate the minimum roundtrip from input to action, and the maximum action rate.

Bartweiss · on Aug 17, 2017

Latency ought to be easily enforced, yeah. The other vagaries of human play, like pixel-precision on mouseclicks and mathematical precision calculating damage and health, are harder to model. But I don't think they'll be impossible, and I think they soon won't be a disaster for the AI.

hobofan · on Aug 17, 2017

> Actions accessible by the bot API, chosen at a frequency comparable to humans

I would guess that this already includes all the delays you are asking for.

gpm · on Aug 17, 2017

That's limiting the input bandwidth to the same as what's available to a human, but not limiting the input latency.

Just like with network traffic, they are two different numbers.

hobofan · on Aug 17, 2017

Very much depends on how you employ the frequency limits. If you say "You can only do 300 actions per minute", the bot could make all 300 actions in the first second if it sees that as optimal, breaking the latency limits. If you however say "You can only do 1 action after waiting 200ms since the last action", the bot will have to choose one optimal action every 200ms, effectively limiting the input latency.

gpm · on Aug 17, 2017

No. Because the bot can choose the next action at the end of the 200ms, with all the information available then, not at the start.

Bandwidth limits are typically of the "You can only do 1 action (send 1 packet) 200ms since you sent the last packet", not "You can only send 300 packets per minute".

hobofan · on Aug 17, 2017

How about "You have to choose an action at the end of 200ms, with the information at the start"?

At the end of the day we can be nitpicky and speculate all day long, but we won't know for sure what they achieved/didn't achieve unless they publish something more concrete than a blog post that is intentionally written in accessible language, which has the side effect eroding some of the more specific measures that were taken to ensure a level playing field.

gpm · on Aug 17, 2017

If they did than then they are simulating latency, but I see nothing to suggest they did that.

thewhitetulip · on Aug 17, 2017

No, there is a difference. The human player has to see, react and then shot. The computer has a direct API.

gun.shoot()

as opposed to

image of a gun falls on the retina process the info sent by the eye synapses fired etc

:)

hobofan · on Aug 17, 2017

I considered that, and I don't think there is. The frequency of the actions is already a result of the "actual time it takes to executr the action" + "all human delays". If you limit the bot to that frequency, everything is accounted for.

----

EDIT: On second thought, there might be a difference, in that this leaves the bot with more time to think, unless you limit the time of that. Not sure if that would dramatically influence the performance though.

thewhitetulip · on Aug 18, 2017

The thing is, every human has a different response time, but the bot doesn't have one. The day a bot will play the game like humans, a physical robot holding a console remote etc then maybe it'd be cancelling out the API advantage that the bot has.

but this doesn't undermine the achievement that they have made. It is phenomenal that an AI can play a complex game like DOTA.

hobofan · on Aug 17, 2017

And the response as I've seen it on other threads: It probably doesn't make a big difference, and will outperform humans there too, and it would be a huge waste of computing power to train it that way.

I think OpenAI should show that the AI can derive (a close aproximation of) the API data from videos, but I don't think that building a closed training loop would add much value here.

lvoudour · on Aug 17, 2017

>And the response as I've seen it on other threads: It probably doesn't make a big difference, and will outperform humans there too, and it would be a huge waste of computing power to train it that way.

Well they may be right about the "outperform" part but they are dead wrong about the waste of time/effort/energy part. I mean if (at least human-like) real-time video/audio recognition and decision making is not an impressive AI feat, I don't know what is. I'm no expert in the field but claiming that plugging into an API and crunching numbers is more important than sensory-based decision making, just doesn't sound right

hobofan · on Aug 17, 2017

> sensory-based decision making

You can pretty cleanly split that up into two different problems: "sensory-based data extraction" and "data-based decision making"

Though I haven't worked in the field of self-driving cars, I am fairly confident that they employ a similar split: One part that takes in all the (pre-processed) data from LIDAR, cameras, etc. and maps that to a simplified model of the surroundings, and another part that makes the driving decisions based on the simplified model.

Sensory->data mapping doesn't raise a lot of eyebrows anymore if you can generate as much sensory information as you want to explore all possible states, as it is possible with Dota.

lvoudour · on Aug 17, 2017

To me though as an outside observer it seems that the dota problem is harder than the car problem. There's too much action going on, too many visual/aural cues to keep track of, team cooperation/coordination, fog of war, etc.

Car AI is important because there are real life-or-death consequences, but the problem (again to my limited experience) seems more tractable: path choices are limited, action is rare, there's no team element and there's no competition. Even for human drivers driving a car in a city or motorway is a tedious, mostly repetitive task. Now, competitive driving raises the stakes and we haven't seen any self-driving car tackling that problem yet (which will definitely raise a lot of eyebrows)

hobofan · on Aug 17, 2017

> there's no team element and there's no competition

I don't think that this is true, considering all the other humans on/near the road, that can influence the system by making actions of their own.

> There's too much action going on, too many visual/aural cues to keep track of, team cooperation/coordination, fog of war, etc.

I dont see how this is different from a car at all.

- too many visual/aural cues to keep track of -> everything you can see, car horns, etc.

- team cooperation/coordination -> as I said everyone else on/near the road

- fog of war -> blind spots & people hidden behind objects

From some experience in the AI world, I can say that I've seen systems that are good enough that they should be able to solve most of those subproblems in the limited system that is Dota. Yes, fog of war might also be interesting given that you have an agressive opponent. The real novel things lie at the strategy level, like picking a hero, buying items, etc., because, like AlphaGo, they demonstrate reasoning in systems with a large action space and delayed payoff.

radarsat1 · on Aug 17, 2017

It's their long-term goal: https://blog.openai.com/universe/

haeffin · on Aug 17, 2017

You can't get the things you get from the API from videos alone. If a player wants to see the amount of mana an enemy has, he has to click on the enemy, you don't see it usually. Same for items. There is an active component in information gathering in Dota (and those actions take time/clicks away from other things you could do at the same time).

hacker_9 · on Aug 17, 2017

Depends what the aim is, to create general AI? Or just a good DOTA bot? If the goal is the former, then vision will play a huge part.

Fnoord · on Aug 17, 2017

The comparison goes indeed moot, for a myriad of reasons.

Interface is one of the many.

If we look at a game such as WoW we can observe the fights in group content (dungeons and raids) have become more difficult over time. The WoW development team cites a few reasons for that: 1) players have become better so the difficulty has gone up, and 2) boss mods (software) have become better. This software aids the user in observing/notifying (sensing actions) , and executing (processing information and deciding upon it) mechanics on a fight and are available thanks to the LUA engine and API. Ironically, even without such software, that game has improved majorly in communicating mechanics to the player over the years (WoW is from 2004).

Hence aspects like the UI and API are going to affect the quality of gameplay of humans as well. For AI it would only be API.

Furthermore, there is network lag, interface lag, input lag, and cognitive lag. Only the latter seems fair game to me.

dumbmatter · on Aug 17, 2017

Yeah, it's like Watson on Jeopardy. It could buzz in faster than a human, so it got to answer every question that it thought it knew, which was most of them. Problem is, the human competitors surely also knew most of them (maybe even more than Watson) but they couldn't buzz in fast enough.

madez · on Aug 17, 2017

> The really impressive feat will be an AI that uses the same sensory information to make decisions (and I really hope that's where the openai will head next)

Impressive? Yes. Interesting? Not as much. We consider AI important not because it can play Dota. We consider stronger AI important for making our lives easier by solving problems. For AI to solve our problems we will not unnecessarily restrict it to our sensory information.

hossbeast · on Aug 17, 2017

Until they ditch the API, I don't want to read about them claiming to have "beaten" a human opponent. It's nonsense. I'm sure your AI is impressive, but you're not even playing the same game.

autokad · on Aug 17, 2017

one thing AI bots can be good for is testing to see if games are balanced. For instance, I am a zerg player and I bought star craft 2, but it was immediately clear to me that the game was heavily unbalanced. I quit playing.

whether you agree protos was over-powered or not, I think we can all agree that game developers can benefit from AI with human limitations. it can help them design better games

grogenaut · on Aug 17, 2017

I agree that it's a bit of a over hype tactic in a toy situation which might be marking to maybe make people think the technique and bot are more capable than it is. But I think people are overly down on it too. It's a demonstration of a technique in a way the general public (and gamers) will understand. OpenAI isn't going to make money off of building game bots... people wouldn't watch. The human drama is a major ingredient in e-sports.

But we shouldn't be down on this while we were going gaga over a lego sorter done the same way a month or so back.

It looks like for some things we can almost have a plug and play ai solution. EG, like we are seeing with image classifiers, this doesn't take years of phd doctoral research and game theory to build up a world class bot. Which is what everyone used to do. This is moving some of these techniques into the "get data set, get hardware, download library, train" plug and play type solution which we're seeing more and more with in other areas like machine classification. Eg stuff anyone with a few years of experience can do, maybe not amazingly, but better than they could hand coding the solution. The problem becomes one of gathering good training sets or building an accurate simulation to train in.

This means, I think, that you'll see way more of these types of ai solutions where people would have balked at a hand coded solution before. This in turn looks a lot like mobile's change to computing where things that were annoying to do on your home pc became different just because you had a camera + gps + computer + radio in your pocket.

I know my company has started using classifiers a lot more for things that are kinda sliding bad user actions instead of coding up huge rules engines. We may not be as effective as a several area deep engineers writing rules and doing data analysis, but instead we have 1 engineer per problem space being about 70% as effective which is still a huge win over not solving the problems at all.

The funny thing is that this bot actually pulled off the stereotypical hollywood training montage with just a few weeks of hard work it beat the best in the world. Just get some sweet rock in there and you've got it all.

bdz · on Aug 17, 2017

Props for the $12k donation to OpenDota. That's really awesome! Tho I personally always preferred Dotabuff.

literallycancer · on Aug 17, 2017

The features are slightly different so people usually use both.

rtpg · on Aug 17, 2017

Is there a good self-contained example of how people set up learning in ways where "the AI doesn't initially know the rules"?

I've heard this many times and conceptually I get the principle, but I have a hard time understanding how you create a legitimate starting position or measurement mechanism beyond "losing/winning".

sanxiyn · on Aug 17, 2017

The linked article says "The bot received incentives for winning and basic metrics like health and last hits". So apart from losing/winning, losing health is bad, last hitting is good. You could add more, but apparently that's all OpenAI used.

mannykannot · on Aug 17, 2017

It also says "We also separately trained the initial creep block using traditional RL techniques." I have no idea how significant that is, but it seems to be getting a fair amount of attention.

gcp · on Aug 17, 2017

It's a highly specific procedure that happens before there is interaction with the opponent, so without "handing over" the understanding that having creeps on your high-ground is good, it's very hard for the learning to see through the noise and discover this.

damnfine · on Aug 17, 2017

Exactly why this is not impressive to me. The point is to be able to learn the rules, but all I see is some of not only the rules, but the actions already prespecified in many cases. Yes its hard, and thats why humans still rule the roost.

sp332 · on Aug 17, 2017

Humans learn from each other and don't puzzle everything out from first principles. So I wouldn't put that restriction on a computer intelligence either.

mannykannot · on Aug 17, 2017

That is fine, so long as we are not expected to hold both that it is not a big deal when it is not achieved, and is a big deal when it is achieved. Personally, I incline towards the latter.

Twirrim · on Aug 17, 2017

Surely all this RL process did was speed up what the computer would have learned, by a large stretch? The "cost" factors they chose would have hit them ultimately, regardless of if the bot stood still in the base, or wandered off elsewhere.

mannykannot · on Aug 18, 2017

My guess is that in general, many complex strategies are effectively unreachable without something like analysis, on account of intermediate states being disfavored, leading to algorithms being trapped by local minima in the cost function (I don't know whether that would be an issue for this game, specifically.)

aoeuasdf1 · on Aug 17, 2017

Not necessarily - maybe a mixed strategy of

(1) Not creepblocking at all, and letting your opponent have your creeps under their tower

(2) Modest creepblocking to punish (1)

(3) Severe creepblocking to punish (2)

Would be better than some hand-trained RL creepblocking which is divorced from game outcomes.

kyberias · on Aug 17, 2017

The classic reinforcement learning -based AI (from 1992) that beats humans (maybe not top players though) in Backgammon: https://en.wikipedia.org/wiki/TD-Gammon

omarforgotpwd · on Aug 17, 2017

I’m by no means an expert, but I’m fascinated by the idea that a neural net playing against itself can substantially outperform a supervised learning approach with a large training data set. I mean, gathering training data and making sure it’s labeled correctly and all that is a huge hassle so if you could eliminate that step or even reduce the amount or quality of training data required that should be a big win for AI, right? Especially if doing this not only makes things easier but also improves the performance of the model.

tictacttoe · on Aug 17, 2017

It's very cool, but I think it also requires a very specific "adversarial" problem with a well defined notion of success (winning the game). If your machine learning task is something more nuanced and harder to define, e.g. identify word synonyms, I don't see how you can get around having a training data set.

kpil · on Aug 17, 2017

I played with evolving algorithms playing some kind of robot war (in java) almost 20 years ago, and beside the fact that you had to be veeery patient since evolving a generation took 10 minutes at best, I realized that it's very hard to create a survival criteria that matches your intention.

I e, most robots fled and hid in the corners. I added additional critiera for hitting enemies, and the robots fled while randomly shooting bullets and hid in the corners...

No epic fights.

Hmm. Maybe I have the code somewhere...

NamTaf · on Aug 17, 2017

Reinforcement learning isn't a new idea - I did a Berkeley-based edX course on it a few years ago now and it was not state-of-the-art to my knowledge. That had no deep aspect to it, we just generated a reinfrocement algorithm that utilised a good measure of performance (specifically, it was pacman and the score value is pretty good at that) and changed a few algorithm weighting variables at each iteration.

My understanding, from talking to a ML friend this morning, is that the latest progress is taking reinforcement learning and applying deep learning approaches (nets, etc.) to it. The key becomes finding the right scoring algorithms to tweak the neural net correctly towards the desired outcome.

The self-play really is the reinforcement side of things at work. How you take that 'score' and use it to correctly modify the input weightings - be them in a neural net, traditional algorithm, etc. - is the key.

mannykannot · on Aug 17, 2017

> The key becomes finding the right scoring algorithms to tweak the neural net correctly towards the desired outcome.

Does this not become something similar to supervised learning if you are scoring internal states of the game? (i.e. scoring on more than just the outcome and things that violate the rules?)

NamTaf · on Aug 17, 2017

I don't know what they're scoring on; they've not elaborated on that. However, when I first read into OpenAI it looked like many of their game links were to simply view the screen. To that end, it wouldn't surprise me if they're identifying things you'd otherwise visually see in the game and simply working off them. To use internal states not immediately visible to the player would be sort of disingenuous IMO.

noway421 · on Aug 17, 2017

Why wouldn't algorithm reach a local maxima when playing with itself, or even degrade over time by opening up to unknown attacks?

gcp · on Aug 17, 2017

One typically keeps pools of trained networks to combat this.

Epokhe · on Aug 17, 2017

it does. the bot can be defeated easily with tactics it didn't encounter while playing with itself.

pycal · on Aug 17, 2017

Just a guess, but wouldn't this approach used in isolation result in a high potential for getting stuck at local optima? It seems to me that the success of this experiment is the result of interaction with players in different MMR brackets.

Bartweiss · on Aug 17, 2017

Yes, it can. It's fairly common to keep a pool of learners which aren't identical for exactly this reason - either playing against prior snapshots, forking training, or lightly randomizing some components of the learner.

debacle · on Aug 17, 2017

I find the mechanism for learning + the timeline far more impressive than what they accomplished. A series of 5 bots that can consistently compete with 5 humans at the 4.5k+ level would be a very impressive display of AI training.

distances · on Aug 17, 2017

What I'd like to see is them implementing AI for a strategy game that benefits from an overall vision, such as Civilization.

And then selling the AI to Firaxis.

Yes, Civ 6 AI still isn't anywhere close to what it should be.

Bartweiss · on Aug 17, 2017

Agreed. There's been a lot of criticism of their final outcome, which is understandable - it's not at all "beats pro players at DOTA". But seeing that timeline absolutely blew me away.

In particular, I'm stunned that the pro players accurately assessed "Sumail will win" on the 9th, but the improvements of one day of training invalidated the assessment.

jsnell · on Aug 17, 2017

Dupe, more discussion at https://news.ycombinator.com/item?id=15031470

aaron695 · on Aug 17, 2017

Unfortunately the HN title there which still isn't correct at time of writing this, distroyed a proper conversation.

(Assuming the original article didn't fix their title)

hfsktr · on Aug 17, 2017

I loved the different ideas to throw the bot off like pulling creeps in a way that would not work for a human. Slacks' courier strategy was entertaining as well.

I don't know enough (no more than a layperson) about AI to have any meaningful comment there. Do they need to train the bot on every hero the same way or does it only need to relearn the hero specifics (and not items/strategies)?

hfsktr · on Aug 17, 2017

Another thing I noticed. Others talked about it using API etc etc so that means that you can't visually trick it by stopping an attack mid animation like you can with human players right?

tahw · on Aug 17, 2017

The version they used for TI had a variety of rules that completely changed the metagame of 1v1 (no bottle, for example). Even ignoring the obvious API advantage, the match was unfair because the Pros had never trained under the constraints that the AI team brought.

jdoliner · on Aug 17, 2017

They played under standard 1v1 tournament rules: http://wiki.teamliquid.net/dota2/Dota_2_Asia_Championships/2...

taormina · on Aug 17, 2017

That's a good hero to lead off with. Soulstealer (is the HoN name, blanking on the original Dota name) is a hero with very basic mechanics. Or rather, there's a ranged, AoE ability, an ultimate that boils down to "stand in the middle and hit the button", but the rest of the mechanics boil down to "last hit lane creeps well" which is a huge Dota 2 game mechanic. And this hero does better as they succeed at last hitting.

yurrzz · on Aug 17, 2017

Nevermore, the Shadow Fiend is the original Dota name.

Aron · on Aug 17, 2017

I believe biological explanations might account partially for the openAI bot outperforming human players.

CoffeeBob · on Aug 17, 2017

Does anyone else have a problem with the line, "the graph is surprisingly linear, meaning the team improved the bot exponentially over time"?

pycal · on Aug 17, 2017

I think what they're pointing to is that the ELO system "true skill" that dota uses is a log normal distribution. To your point I'm not sure that means that player skill improves along the distribution, but I think it does mean their probability of winning increases exponentially.

Havoc · on Aug 17, 2017

Really cool writeup. Enjoyed that thoroughly

misticdeveloper · on Aug 17, 2017

So it appears their bot isn't cheating with vision, and has its action speed capped to human levels. Interesting!

Sad they had whitelisted item builds. I thought the whole point of a machine learnign bot was it was supposed to learn these themselves.

5v5 full game is way more complex than Starcraft. Hope OpenAI are ready.

juskrey · on Aug 17, 2017

AI players that use internal game calls could beat humans from the beginning of the history.

kensai · on Aug 17, 2017

It is amazing what Peter Thiel's OpenAI does! Congrats to his genius.

kensai · on Aug 17, 2017

I sincerely don't understand the downvoting. It's Peter Thiel's as much as it is Elon Musk's OpenAI.