Metric comparison

Common flaws and terminology

Flaws common to all listed skill metrics, unless stated otherwise:

No distinction between skill and premium consumable use. Premium ammo won't turn an average player into a top-1%, but it is a significant advantage. Winrate-based metrics will gain a little less from premium ammo, but not by much.
No distinction between skill and crew skill. This can be loosely approximated by considering the number of tanks played vs the number of battles.
No detection or adjustment for stock battles played. For tanks/stats based metrics it's possible for stat sites to approximate this by monitoring accounts very frequently and ignoring early battles in each tank, but very costly.
No handling of tier 1-3 newbie MM. Currently players with <2500 battles will get far better results for their skill at low tiers than players with >2500 battles.
No adjustment for time of day or server played. Difficulty almost certainly varies.

Terminology:

Playstyle bias: Metric error caused by rewarding playstyles that are not better for winning.
Tank-selection bias: Metric error caused by not adjusting correctly for tanks played.
Tier bias: Metric error caused by not adjusting correctly for tiers played. Subset of tank-selection bias.
Platoon bias: Metric error caused by platooning with players of non-average performance.

Flaws:

Moderate playstyle bias, due to lack of assisted damage. Better than WN8 due to increased influence of frags*spots term (esp for scout tanks), but not as good as WG-PR.
Increased influence of spots may be somewhat paddable.
Expected values are based on elite tanks, so comparing players with different approaches to grinding is not likely to be "fair".
Account WN9 method requires tanks/stats data and so is expensive to calculate.

Strengths:

Very low tank-selection bias, especially for recent battles.
No direct platoon bias. No evidence of indirect platoon bias.
Contribution-linear. A player with 200 WN9 contributes twice as much as a player with 100 WN9.
Lowest error by a distance for most accounts, especially for recent data. WG-PR is slightly better for very bad players.
Account WN9 method reduces reroll advantage.

Flaws:

Moderate tank-selection bias for mid-level players, as the expected values method is far from perfect.
High tank-selection bias at high skill levels. Tanks scale very differently with skill, and WN8 does not account for that.
More playstyle bias than WN9, especially for tanks that depend heavily on assisted damage.
Unspecified handling of missing tanks means that implementations often differ wildly even when using the same expected values.
Not contribution linear, mostly due to the inclusion of heavily-platooned players in the derivation. Scales very rapidly at high skill levels.
Small platoon bias from winrate component. Only benefits limited platoon-padders.

Strengths:

Flaws:

Worse playstyle bias than any other metric, as it only uses damage and frags.
Very bad tank-selection bias for mediocre players, because the lower expected value is essentially a dilution of WN8 expected values.
Requires tanks/stats data, so no cheap way to maintain recent values.
Tank-selection bias is bad at low tiers even for high skill levels.

Strengths:

No direct platoon bias.
Less tank-selection bias than WN8 at high skill levels due to the use of a top-X% expected value.

Flaws:

No recent or per-tank method.
Moderate platoon bias due to use of winrate, survival rate and base XP.
High tank-selection bias, although not necessarily as bad as WN8 at high skill levels. Strong incentive to only play the strongest tanks.
Moderate tier bias (in favour of high tiers) for players who get decent winrates.
Some players are heavily overrated, apparently due to an early bug in base XP recording.
Survival rate? Really?

Strengths:

Penalises low battle-count rerolls.
Low playstyle bias due to the use of assisted damage, although survival rate doesn't help.

Flaws:

Very high platoon bias (favours platooning with good players, punishes platooning with bad players).
Very high tier bias (favours low tiers).
High tank-selection bias (favours strong tanks).
Bad signal to noise ratio. Battle counts below ~1k are mostly useless, and even 1k isn't great.

Strengths:

Low playstyle bias. Arguably over-rewards performance in top-tier battles, but otherwise winning is winning.

Flaws:

Strengths: