17. Greetings. This is the second segment to the three-part discussion on formats that leave behind a strong legacy and how Worlds 2013 is an embodiment of that legacy. In this article, I provide a simple model of skill curves and how to interpret them. This model helps to tease apart skillful formats from the more subjective values that influence what people call a “good” or “bad” format and sets the stage for the Worlds analysis in part 3.
Format Skill Curves in Graphic Form
There is a thought experiment that describes the kind of formats competitive players honor: If two players of very different skill levels faced each other with identical decks and played a thousand times, the win rate between the two would be an even 50%-50% in a truly random format and devoid of all skill, and 100%-0% in favor of the more experienced player in a format where skill alone determined the outcome of matches. An example of a 50% game is Tic-Tac-Toe, and an example of a 100% game (or one quite close to it) is Chess.
Since even the most degenerate forms of Yugioh involve decision making, this experiment would never result in the 50-50 outcome, not even in a chaotic traditional format. On the other hand, since even the most skill-based forms of Yugioh involve luck of the draw, this experiment would never result in the 100-0 outcome, not even in the grind of Goat Control format. The worst Yugioh formats, affectionately termed “dice-roll” formats, push toward the hypothetical 50-50 (maybe something like 60-40), and the legacy formats push toward the hypothetical 100-0 (perhaps around 90-10). “Different skill level” in the thought experiment can be distilled down to the difference in time each player has invested into practice, though in reality it would include other factors (fig 1).
Fig 1. The exact hour and % increments in a model representation deviate from real-world data (if it were ever possible to be mined). A real-world plot would show something like zig-zags rather than neat curves. However, general characteristics of this model will hold true regardless of real-world deviations in individual data points. All formats have parabolic functions and an asymptote at some number of hours, where win % can no longer increase. Consider three important patterns:
1) These asymptotes begin at varying x-values.
2) These asymptotes have varying y-values. They will never be at or below 50, and they will never be at or above 100.
3) Win rate will not increase linearly except at low levels of experience, and will always even out as a parabolic curve over time.
Let’s interpret these three characteristics and examine their implications.
1. Varying x-values
While all formats will have an x-value cap where additional experience cannot affect longitudinal results, these caps vary. Once again, extreme examples illustrate this best. A game like tic-tac-toe would have an asymptote that begins at close to 0 hours. If it were plotted on the above graph, it would look like a flat line. A game like Chess does have an asymptote, but it would begin far off the x-axis shown here and wouldn’t fit in the graph. Yugioh falls in between those extremes and is highly variable. The implication is that calling a deck “auto-pilot” is incredibly misleading and vague. No deck is auto-pilot, in the sense that even the simplest deck will still have a curve that increases for a bit before reaching its asymptote. Yet, all decks are auto-pilot, in the sense that all decks have asymptotes eventually. This means it’s possible to practice to a point where you can play anything in your sleep, even Goat Control. This is why Jae Kim has said Tele-DAD is auto-pilot for him (which was received with controversy). All decks have a non-auto-pilot point, and all decks have an auto-pilot point. Thus, calling a deck auto-pilot in itself doesn’t say much about the deck. Players often mean that the deck reaches the auto-pilot point at lower levels of experience when they say a deck is auto-pilot.
2. Varying y-values
The peak y-value for a format or deck determines how much difference experience, and thus, decision making can actually influence the outcome of your matches in the long term. Whereas x-values show the skill demand for a game, y-values show luck dependence. They sound like the same thing, but actually give you two different pieces of information. For instance, tic-tac-toe’s asymptote would begin near an x-value of 0 (low skill demand), yet its flat-lined asymptote would run straight through the y-value of 100. This is because the little amount of skill involved in tic-tac-toe guarantees you will win against someone without that skill every time. In contrast, rolling electronic dice would have an asymptote that both begins at x=0 (no skill demand), and also runs through a y-value of 0 (all luck).
The important implication here is that naming individual cards that are “sacky” does not determine whether a format was skillful. The rule that no format’s asymptote reaches 100% on the y-axis itself already accommodates for the information that there will always be a luck element. The question is how MUCH there is. The comment “Rejuve takes all the skill out of E-Dragons” embodies what many people have said all summer long. However, this is too absolute a statement and is not true. It is correct to say that Rejuvenation takes some skill out of the deck, but that single citation alone is not sufficient information to determine how it stacks against other formats. Every format has degenerate combos, even the complex grind-game formats, so to state that X-deck is not skillful because it can pull off X-unfair combo is redundant and too absolute (if it were really true, the skill curve would be a flat line at y = 50, the equivalent of rolling dice). The takeaway message is that naming an individual degenerate card or scenario cannot debunk a format as a whole as unskillful, as it can be done with ANY format. Every deck will have some unfair play that pulls its asymptote below 100, but the discussion of legacy is a question of how far below 100 it is.
3. Parabolic skill curve
Parabolic growth (as opposed to linear growth) simply means that players experience diminishing returns with practice. It becomes increasingly difficult to see tangible results for your effort as you reach high levels of play, which is why few players are willing to travel that road. This is because the closer you reach to your format’s peak, the fewer of your losses are actually within your control. For instance, if you are playing a format that’s pretty healthy and that allows you to only lose 10% of games to the “sack” factor, then it is easier to grow from a 50% win rate to a 60% win rate in that format than it is to grow from an 80% win rate to the peak 90% win rate. This phenomenon is not just true of Yugioh, but actually of nearly all skills in general. Across the board, research has shown that you improve more in the first 20 hours of practicing an activity, and then your rate of improvement tapers off. (For those who may be curious, it takes roughtly 10,000 hours to become a master at most sufficiently deep activities, such as a sport, coding, and certain games.)
In diverse formats, the asymptotes for many of the decks come earlier rather than later. This means that players can experience the results they want in times they find reasonable. The important implication is that it (in part) explains why the most competitive players favor best-deck formats and the vast majority of players want diversity from their banlists. If this discord between the competitive vs. the popular opinion ever seemed strange to you, you now have a somewhat mathematical explanation.
Best-deck (non-diverse) formats produce late asymptote-high asymptote curves more often than diverse formats, though I concede that there are a number of exceptions, some of which I named in part 1. Patrick also wrote on this a while back in an article of his own: http://articles.alterealitygames.com/the-diversity-argument/
Using the above attributes, it becomes apparent what competitive players call a “good” format. Asymptotes start at later x-values (there are more layers of depth to a particular deck), skill curves flatten at higher y-values (the role of decision making in match results is large), and thus the curve as a whole sits high on the graph. Since there is no official curve that separates the bad formats from the good, it becomes subjective which formats were skillful and which were not, with the subjectivity being where you choose to define the cutoff. Do we draw the line at 80%? 70%? It’s hard to say. However, there is usually a significant gap between the ones revered in the long term, the legacies, and the ones which just stink. Even though there is no official line of separation in the graph in figure 1, can you tell which two decks most would regard as legacy and which as “sacky?”
It is important to note what the competitive model of skill curves does NOT cover. For one thing, it does not relate to the economy of the game. This comment from a casual player puts it well, “My dream is a format where you can put together a deck for $30 and it can compete against a deck someone spent thousands on and have a fair chance.” This popular opinion takes us back to the whole issue of why players argue over what a “good” format is. Competitive players say “good” to mean one where the skill curves are more like the Goat or Plant curves in the above graph and less like the DSF or Scientist curves. The remainder of the playerbase can mean other things by “good,” which can be influenced by their personal subjective values such as finances, time, or their unique perspective on fun.
Thus, when naysayers contend that Dragon Ruler is too expensive or dominant, they argue from subjective values I cannot disagree with. After all, how can I say what you value in the game is not valuable to you? It’s what YOU value! However, when these preferences bleed into more objective territory such as the skill demand of the deck, that is when there is a concrete correct and incorrect. A large pricetag on Dragon Ruler doesn’t make the deck any less skillful than a $1,000 crystal, diamond-studded Chess set makes Chess any less skillful. How difficult the pieces are to obtain does not affect whether the game played with those pieces is skillful or not. Additionally, dominance does not affect the involvement of skill. One of the most common complaints uttered on DR is that it decisively wins against most other decks, but this alone has no bearing on the involvement of skill at peak play. The edge you get from using a dominant strategy is completely erased if everyone else uses either the same dominant strategy or the counter to the dominant strategy.
Distaste for a dominant strategy leads to circular reasoning. If a player with that preference got what he wanted, and the best deck were completely obliterated, then the next best deck would simply take its seat in the place of dominance. From a competitive standpoint, this reality explains why it’s more important that degenerate strategies are hit rather than dominant decks. After all, hitting a dominant deck simply gives you a new dominant deck, whereas hitting a degenerate card can actually raise the skill curve or change the asymptote position of the format. You’ll find that competitive players are more concerned with singleton cards being banned for how they lower the curve of a format (Reborn, Avarice, Card Destruction, Gateway of the Six, Shock Master), and the remainder of the playerbase is usually more vocal about the best decks being hit (Dragon Ruler, Spellbook). Fortunately, Konami did both on the September 2013 TCG list.
How the game will shape up in light of this list is worthy of discussion in future articles. For now I’ll just say that I am pleasantly shocked that the TCG has a separate list from the OCG. I see this as the second best list of all time in terms of how much skillful play is favored by the cards that were hit (can you guess which is the first)?
Stay tuned for the the concluding segment in this series, where I provide match analysis covering Worlds 2013!
Until next time,
Play Hard or Go Home.