Metric comparison
Common flaws and terminology
Flaws common to all listed skill metrics, unless stated otherwise:
- No distinction between skill and premium consumable use. Premium ammo won't turn an average player into a top-1%, but it is a significant advantage. Winrate-based metrics will gain a little less from premium ammo, but not by much.
- No distinction between skill and crew skill. This can be loosely approximated by considering the number of tanks played vs the number of battles.
- No detection or adjustment for stock battles played. For tanks/stats based metrics it's possible for stat sites to approximate this by monitoring accounts very frequently and ignoring early battles in each tank, but very costly.
- No handling of tier 1-3 newbie MM. Currently players with <2500 battles will get far better results for their skill at low tiers than players with >2500 battles.
- No adjustment for time of day or server played. Difficulty almost certainly varies.
Terminology:
- Playstyle bias: Metric error caused by rewarding playstyles that are not better for winning.
- Tank-selection bias: Metric error caused by not adjusting correctly for tanks played.
- Tier bias: Metric error caused by not adjusting correctly for tiers played. Subset of tank-selection bias.
- Platoon bias: Metric error caused by platooning with players of non-average performance.
WN9
Flaws:
- Moderate playstyle bias, due to lack of assisted damage. Better than WN8 due to increased influence of frags*spots term (esp for scout tanks), but not as good as WG-PR.
- Increased influence of spots may be somewhat paddable.
- Expected values are based on elite tanks, so comparing players with different approaches to grinding is not likely to be "fair".
- Account WN9 method requires tanks/stats data and so is expensive to calculate.
Strengths:
- Very low tank-selection bias, especially for recent battles.
- No direct platoon bias. No evidence of indirect platoon bias.
- Contribution-linear. A player with 200 WN9 contributes twice as much as a player with 100 WN9.
- Lowest error by a distance for most accounts, especially for recent data. WG-PR is slightly better for very bad players.
- Account WN9 method reduces reroll advantage.
WN8
Flaws:
- Moderate tank-selection bias for mid-level players, as the expected values method is far from perfect.
- High tank-selection bias at high skill levels. Tanks scale very differently with skill, and WN8 does not account for that.
- More playstyle bias than WN9, especially for tanks that depend heavily on assisted damage.
- Unspecified handling of missing tanks means that implementations often differ wildly even when using the same expected values.
- Not contribution linear, mostly due to the inclusion of heavily-platooned players in the derivation. Scales very rapidly at high skill levels.
- Small platoon bias from winrate component. Only benefits limited platoon-padders.
Strengths:
- Overall and recent values can be maintained cheaply with account/tanks data.
- Tier scaling is pretty good.
xTE
Flaws:
- Worse playstyle bias than any other metric, as it only uses damage and frags.
- Very bad tank-selection bias for mediocre players, because the lower expected value is essentially a dilution of WN8 expected values.
- Requires tanks/stats data, so no cheap way to maintain recent values.
- Tank-selection bias is bad at low tiers even for high skill levels.
Strengths:
- No direct platoon bias.
- Less tank-selection bias than WN8 at high skill levels due to the use of a top-X% expected value.
WG-PR
Flaws:
- No recent or per-tank method.
- Moderate platoon bias due to use of winrate, survival rate and base XP.
- High tank-selection bias, although not necessarily as bad as WN8 at high skill levels. Strong incentive to only play the strongest tanks.
- Moderate tier bias (in favour of high tiers) for players who get decent winrates.
- Some players are heavily overrated, apparently due to an early bug in base XP recording.
- Survival rate? Really?
Strengths:
- Penalises low battle-count rerolls.
- Low playstyle bias due to the use of assisted damage, although survival rate doesn't help.
Winrate
Flaws:
- Very high platoon bias (favours platooning with good players, punishes platooning with bad players).
- Very high tier bias (favours low tiers).
- High tank-selection bias (favours strong tanks).
- Bad signal to noise ratio. Battle counts below ~1k are mostly useless, and even 1k isn't great.
Strengths:
- Low playstyle bias. Arguably over-rewards performance in top-tier battles, but otherwise winning is winning.
Tank-adjusted winrate
Flaws:
- Very high platoon bias.
- Bad signal to noise ratio.
Strengths:
- Low playstyle bias.
- Low tank-selection bias.