Gamalytic

How to accurately estimate Steam game sales

5/13/2023

In this article, we'll explore several methods of estimating game sales on Steam. Mainly, we will look at the following estimation methods:

  1. Review multiples (Boxleiter method)
  2. Using the Steam's top seller rank to estimate revenue and sales.
  3. Polling public steam profiles to estimate game ownership. Similar to how SteamSpy used to work before the Steam profile privacy change.
  4. Using the number of concurrent players to estimate game's sales

At the end, we will come to an algorithm that estimates sales and revenue much more accurately than the review based approach.

To test the estimates, I collected this sample of around 120 games with public sales data and made a script that compares game sales to estimates on a given date. This is an unbiased sample that contains games of various sizes, genres and years of release. Due to high error margins, we will ignore games that sold less than 1000 copies in our tests.

Review multiples (Boxleiter method)

Let's start with the simplest method. Taking the number of reviews and multiplying it by a constant number. The median sales/review ratio is around 35. Here are the results after running a test:

Aggregate accuracy100.61%
Average accuracy63.74%
% of games within 10% error margin12.82%
% of games within 30% error margin42.74%
% of games within 50% error margin70.94%
% of games within 70% error margin88.89%

That's relatively inaccurate. Less than half of games are within 30% error margin and 30% of games are off by more than 50%

Now, let's try to improve our method by using different multiples for different release dates, price, review scores, and the like. Something similar to what is described here and here.

Aggregate accuracy103.58%
Average accuracy67.34%
% of games within 10% error margin12.82%
% of games within 30% error margin50.43%
% of games within 50% error margin80.34%
% of games within 70% error margin94.87%

This looks a little better, but is still pretty imprecise.

Estimating game sales using concurrent player count and average playtime estimates

Now let's try another method. We can add up the number of concurrent players for every hour and divide that by the average playtime estimate to estimate game's playerbase. Since number of concurrent players is reported as an exact number via steam API, accuracy of this method depends completely on the average playtime estimate. We can estimate average playtime from all public indicators like public profiles and game reviews. The more data points we can gather, the more accurate the estimate will be. Here are the results after running a test:

Aggregate accuracy100.31%
Average accuracy76.02%
% of games within 10% error margin21.11%
% of games within 30% error margin64.44%
% of games within 50% error margin93.33%
% of games within 70% error margin100%

This is better than the review based approach, but is still not accurate enough. Also, for some games we do not have accurate historical concurrent player data, so this method cannot be applied to all games.

Using Steam's top seller rank to estimate sales

Let's try one more thing. Using the steam top seller rank to estimate game revenue and sales.

Top seller rank is a good indicator of how a game is doing. Steam's top seller lists are generated automatically based on all revenue sources for a game, including DLCs and in-game-transactions.

This is a double-edged sword, as it allows us to estimate revenue for free-to-play games but can disrupt unit sales estimations for paid games with in-app-purchases, so we have to be careful with that.

Here are the results of the test.

Aggregate accuracy108%
Average accuracy71.74%
% of games within 10% error margin21.05%
% of games within 30% error margin61.4%
% of games within 50% error margin85.96%
% of games within 70% error margin96.49%

Polling public profiles

Finally, let's try polling public Steam profiles to estimate game ownership. This is the method used by SteamSpy prior to Steam's privacy policy change. Since the number of public profiles has been significantly reduced since Steam's policy change, instead of the 3-day rollback used by steamSpy, we will use a much larger rollback of 30 days to collect a sufficient sample. And then, we will use the number of reviews and the top seller rank to fine-tune the estimate.

Aggregate accuracy97.57%
Average accuracy80.82%
% of games within 10% error margin31.4%
% of games within 30% error margin77.91%
% of games within 50% error margin97.67%
% of games within 70% error margin100%

This method seems to be the most accurate for now, however, the problem with this approach is that it doesn't really work for smaller games, the margin of error is simply too large. So for anything under 20,000 players, we'll have to rely on other estimation methods.

Aggregating estimates

Now let's aggregate all of the above methods and see what we will get.

We will use different weights for different estimation methods depending on the game (eg for smaller games, profile polls are given less weight)

Additionaly, I will use this 2018 leaked dataset to further adjust the estimates for older games.

Here are the results:

Aggregate accuracy99.73%
Average accuracy80.46%
% of games within 10% error margin30.77%
% of games within 30% error margin76.92%
% of games within 50% error margin99.15%
% of games within 70% error margin100%

Now, this looks much better (than the review based approach we started with)!

Here is a detailed overview of the test results:

GameDateReported units soldEstimate% difference
Tile CitiesSun Jul 10 202214k13.9k0%
DinkumWed Aug 17 2022350k348.3k0%
RustTue Dec 07 202112.4m12.5m1%
Sea of Thieves 2023 EditionWed Dec 22 20215m4.9m-1%
Stardew ValleySun May 15 202213m12.8m-1%
Garry's ModMon Sep 20 202120m20.3m2%
AragamiFri Oct 04 2019320k312.4k-2%
HandshakesTue Feb 21 202360k58.5k-2%
Inspector WafflesWed Mar 23 20223.5k3.3k-4%
Yerba Mate TycoonWed Jun 15 20222k2.1k4%
Placid Plastic Duck SimulatorWed Dec 07 202276.3k73.3k-4%
Buddy Simulator 1984Mon Oct 17 202275k78.1k4%
The Planet CrafterFri Mar 24 2023500k479.3k-4%
HydroneerWed Jul 27 2022500k478.7k-4%
The Wandering VillageSat Mar 25 2023224k214k-4%
Sons Of The ForestFri Feb 24 20232m1.9m-4%
The Witcher® 3: Wild HuntWed Apr 08 202012m11.3m-6%
Production Line : Car factory simulationSat Aug 17 2019100k106k6%
Dwarf FortressWed Jan 04 2023500k470.8k-6%
WartalesThu Apr 27 2023600k562.5k-6%
StacklandsSun Jul 10 2022450k480.1k6%
Darkest Dungeon®Thu Nov 03 20161m1m7%
Bonding AmbivalenceSat Apr 01 20233.4k3.1k-7%
Persona 4 GoldenWed Jun 30 20211m928.7k-7%
Train FeverWed Nov 25 201581k87.4k7%
Watch Your Plastic DuckMon Jan 30 20231.4k1.5k7%
Dungeons of EderaThu Jun 16 202238.5k41.6k7%
My Jigsaw Adventures - Roads of LifeTue Jan 25 20221k1k8%
Core KeeperFri Jun 10 20221m913.9k-9%
ShippedFri Jan 20 20239.5k10.4k9%
AvorionMon Jun 13 2022450k409k-9%
GROSSThu Mar 09 20233k2.7k-9%
Freedom Planet 2Wed Nov 09 202216k14.4k-10%
Punch A BunchFri Feb 24 202310k11k10%
EVERSPACE™ 2Thu Apr 20 2023276k248.5k-10%
StationeersFri Jan 20 2023173k192.1k10%
ValheimMon Apr 25 202210m8.9m-10%
Lost PotatoFri Oct 01 20211.3k1.2k-10%
RollWed Apr 20 202222.4k20k-10%
Eggcelerate!Sat Nov 20 20211k893-11%
The Dungeon BeneathTue Mar 30 20213.6k4k11%
Dread HungerWed Apr 13 20221m891.3k-11%
Elong PlugThu Mar 02 20235k4.4k-11%
Big AmbitionsSat Mar 25 2023150k133.5k-11%
Out of AmmoFri Jan 20 202357k50.7k-11%
Deep Rock GalacticMon Dec 31 2018500k444.4k-11%
The Pale BeyondSat Feb 25 20236k5.4k-11%
Octodad: Dadliest CatchWed Jan 30 2019660k585.5k-11%
Mortal GloryWed Feb 09 202223.9k21.2k-11%
Please Fix The RoadTue Jun 28 202210k8.8k-12%
BarotraumaFri Jun 04 2021800k704.8k-12%
Noobs Want to LiveTue Feb 21 2023100k114.7k13%
SpeedRunnersTue Apr 05 20161m866k-13%
Sands of SalzaarWed Jul 27 20221m860.8k-14%
Escape SimulatorWed May 04 20221m860k-14%
Golfing Over It with Alva MajoFri Jan 20 2023100k85.9k-14%
Cygnus EnterprisesSat Apr 15 20235k5.9k15%
Warsim: The Realm of AslonaSun Dec 12 202130k35.6k16%
Contraband PoliceTue Apr 04 2023250k209.9k-16%
Yi Xian: The Cultivation Card GameSat Jan 21 2023100k83.7k-16%
PawnbarianWed Dec 08 20218.4k6.9k-17%
InscryptionWed Jan 05 20221m1.2m17%
SupralandSun Jun 28 2020250k207.3k-17%
Loop HeroThu Dec 09 20211m1.2m18%
The RiftbreakerThu Oct 13 2022500k410.5k-18%
Battle Royale TycoonMon Nov 04 201915k18.2k18%
TownscaperWed May 19 2021380k467.9k19%
ICARUSFri Jan 20 20231m818.8k-20%
pureyaFri Jan 20 202317k21.2k20%
Among the Sleep - Enhanced EditionTue Feb 14 2017186.1k147.9k-21%
West HuntFri Feb 03 2023110k87.3k-21%
Crusader Kings IIIThu Mar 17 20222m1.5m-21%
PostCollapseTue Apr 07 20204k5k21%
Ravenous DevilsSun May 15 2022100k78.7k-21%
MixolumiaTue Jun 22 20211.4k1.8k22%
EVERSPACE™Thu Apr 20 2023879k684k-22%
MajotoriFri Jan 20 202335k45.4k23%
Cthulhu Saves the WorldThu Mar 17 2022671k514.5k-23%
Osiris: New DawnTue Jan 17 2023600k455.9k-24%
Salome's KissSun Oct 02 20221k1.3k25%
ProtolifeThu Jun 25 202017k12.7k-25%
FuriSun Oct 11 2020280k375.1k25%
City ClimberMon May 17 202120k14.8k-26%
Highway BlossomsThu Jun 17 202150k67.7k26%
Son of a WitchMon May 09 202223k16.9k-26%
SifuFri Mar 31 202350k68k27%
Hot Heat Reset: Chapter 1Thu Mar 09 20232.7k1.9k-28%
LOST EMBERThu Jan 13 2022134k96.3k-28%
A Little Golf JourneyFri Feb 24 20231.5k1k-29%
NebulaFri Feb 24 20232.4k3.4k29%
Project HeartbeatWed Jan 05 20223k4.2k30%
Guns of Icarus OnlineThu Jun 26 2014450k645.9k30%
PeglinTue Apr 03 201880.2k55.3k-31%
InductionTue Mar 20 20181.2k1.8k32%
Cultist SimulatorMon Feb 18 201985k127.1k33%
Winter Falling: Battle TacticsWed Nov 30 20223.1k2.1k-33%
Gibbous - A Cthulhu AdventureSat Jan 16 202140k26.4k-34%
Out of Ammo: Death DriveFri Jan 20 202311.2k7.2k-35%
Missing HikerTue Mar 14 2023100k64k-36%
EastshadeThu Oct 01 2020127k81.1k-36%
Cauldrons of War - BarbarossaSat Apr 08 20238k4.9k-38%
Cosmic Star HeroineThu Mar 17 202258k35.8k-38%
PrimordiaSun May 01 2022200k120.9k-40%
Mortal Online 2Wed Feb 02 2022110k65.8k-40%
Knock-knockTue Feb 14 201794k55.2k-41%
TinyfolksThu Jun 09 202210k17.2k42%
Slime RancherThu Jan 13 20225m2.8m-42%
TilecraftWed Nov 30 20221.2k2.1k44%
Bloody Rally ShowTue Feb 23 20212.5k4.4k44%
The WreckMon Apr 17 20231k564-44%
Hats and Hand GrenadesWed Dec 07 202224k13.2k-45%
Larcin LazerThu Feb 23 20231.2k2.1k45%
Missing HikerThu Feb 16 202310k18.3k46%
The CompanionMon Nov 21 20223k1.6k-46%
Cthulhu Saves ChristmasThu Mar 17 202213k6.9k-47%
Will You Snail?Fri Mar 18 20227.5k14.1k47%
Cat HerderThu Feb 02 20231.5k750-50%
Nightmare ReaperSat Mar 26 202220k53.1k62%

Estimating revenue

However, this does not tells us the whole story. We also want to know the revenue of the game, and sales and revenue do not have to be linearly correlated. Luckily, I made a small algorithm that factors in discounts when calculating revenue based on games price history profile. This should give us the right ball-park for the vast majority of games.

Unfortunately, for some games, the situation is not that simple. It can be hard to differentiate copies sold vs given away for free, and even harder to differentiate copies sold regularly and copies sold in bundles. Further more, it's pretty much impossible to know how the game has sold on 3rd-party sites.

We can use the ratio of reviews marked as purchased on Steam / Activated with a key, look for discrepancies in review ratios and playtime data, look for patterns in public Steam libraries that may indicate a game was purchased in a bundle, and use top seller rankings to help us deduce how many copies were sold directly through steam. However, there is currently no way to know exactly how much the game has made selling on third-party sites or in bundles. We can only confirm how many copies have been sold directly on Steam.

After all, nothing can be as accurate as the numbers provided by the developers themselves. You should always sanity check all estimates before making any decisions

Some things to watch out for:

  • Smaller games have less accurate estimates than the larger ones due to smaller sample size
  • Free-to-play games generally have less accurate estimations
  • Revenue estimates do not take into account revenue from any external sources and may not properly estimate revenue from steam bundles
  • There may be bugs in our scraping or estimation algorithm, causing the estimates to be wildly off. Always sanity check any estimates

Conclusion

Overall, using all of these methods together is far more accurate than using review-based estimations only. It is important to note, however, that when conducting market research, these estimates should only be used as a supplementary information and should not be followed blindly. Obviously, there is still room for improvement here, and I will continue to work on the algorithm to improve it even more.

If you found this article helpful, check out our tool at gamalytic.com


© 2023 Gamalytic.com

 | 

Powered by Steam