How to accurately estimate Steam game sales
5/13/2023
In this article, we'll explore several methods of estimating game sales on Steam. Mainly, we will look at the following estimation methods:
- Review multiples (Boxleiter method)
- Using the Steam's top seller rank to estimate revenue and sales.
- Polling public steam profiles to estimate game ownership. Similar to how SteamSpy used to work before the Steam profile privacy change.
- Using the number of concurrent players to estimate game's sales
At the end, we will come to an algorithm that estimates sales and revenue much more accurately than the review based approach.
To test the estimates, I collected this sample of around 120 games with public sales data and made a script that compares game sales to estimates on a given date. This is an unbiased sample that contains games of various sizes, genres and years of release. Due to high error margins, we will ignore games that sold less than 1000 copies in our tests.
Review multiples (Boxleiter method)
Let's start with the simplest method. Taking the number of reviews and multiplying it by a constant number. The median sales/review ratio is around 35. Here are the results after running a test:
Aggregate accuracy | 100.61% |
Average accuracy | 63.74% |
% of games within 10% error margin | 12.82% |
% of games within 30% error margin | 42.74% |
% of games within 50% error margin | 70.94% |
% of games within 70% error margin | 88.89% |
That's relatively inaccurate. Less than half of games are within 30% error margin and 30% of games are off by more than 50%
Now, let's try to improve our method by using different multiples for different release dates, price, review scores, and the like. Something similar to what is described here and here.
Aggregate accuracy | 103.58% |
Average accuracy | 67.34% |
% of games within 10% error margin | 12.82% |
% of games within 30% error margin | 50.43% |
% of games within 50% error margin | 80.34% |
% of games within 70% error margin | 94.87% |
This looks a little better, but is still pretty imprecise.
Estimating game sales using concurrent player count and average playtime estimates
Now let's try another method. We can add up the number of concurrent players for every hour and divide that by the average playtime estimate to estimate game's playerbase. Since number of concurrent players is reported as an exact number via steam API, accuracy of this method depends completely on the average playtime estimate. We can estimate average playtime from all public indicators like public profiles and game reviews. The more data points we can gather, the more accurate the estimate will be. Here are the results after running a test:
Aggregate accuracy | 100.31% |
Average accuracy | 76.02% |
% of games within 10% error margin | 21.11% |
% of games within 30% error margin | 64.44% |
% of games within 50% error margin | 93.33% |
% of games within 70% error margin | 100% |
This is better than the review based approach, but is still not accurate enough. Also, for some games we do not have accurate historical concurrent player data, so this method cannot be applied to all games.
Using Steam's top seller rank to estimate sales
Let's try one more thing. Using the steam top seller rank to estimate game revenue and sales.
Top seller rank is a good indicator of how a game is doing. Steam's top seller lists are generated automatically based on all revenue sources for a game, including DLCs and in-game-transactions.
This is a double-edged sword, as it allows us to estimate revenue for free-to-play games but can disrupt unit sales estimations for paid games with in-app-purchases, so we have to be careful with that.
Here are the results of the test.
Aggregate accuracy | 108% |
Average accuracy | 71.74% |
% of games within 10% error margin | 21.05% |
% of games within 30% error margin | 61.4% |
% of games within 50% error margin | 85.96% |
% of games within 70% error margin | 96.49% |
Polling public profiles
Finally, let's try polling public Steam profiles to estimate game ownership. This is the method used by SteamSpy prior to Steam's privacy policy change. Since the number of public profiles has been significantly reduced since Steam's policy change, instead of the 3-day rollback used by steamSpy, we will use a much larger rollback of 30 days to collect a sufficient sample. And then, we will use the number of reviews and the top seller rank to fine-tune the estimate.
Aggregate accuracy | 97.57% |
Average accuracy | 80.82% |
% of games within 10% error margin | 31.4% |
% of games within 30% error margin | 77.91% |
% of games within 50% error margin | 97.67% |
% of games within 70% error margin | 100% |
This method seems to be the most accurate for now, however, the problem with this approach is that it doesn't really work for smaller games, the margin of error is simply too large. So for anything under 20,000 players, we'll have to rely on other estimation methods.
Aggregating estimates
Now let's aggregate all of the above methods and see what we will get.
We will use different weights for different estimation methods depending on the game (eg for smaller games, profile polls are given less weight)
Additionaly, I will use this 2018 leaked dataset to further adjust the estimates for older games.
Here are the results:
Aggregate accuracy | 99.73% |
Average accuracy | 80.46% |
% of games within 10% error margin | 30.77% |
% of games within 30% error margin | 76.92% |
% of games within 50% error margin | 99.15% |
% of games within 70% error margin | 100% |
Now, this looks much better (than the review based approach we started with)!
Here is a detailed overview of the test results:
Game | Date | Reported units sold | Estimate | % difference |
---|---|---|---|---|
Tile Cities | Sun Jul 10 2022 | 14k | 13.9k | 0% |
Dinkum | Wed Aug 17 2022 | 350k | 348.3k | 0% |
Rust | Tue Dec 07 2021 | 12.4m | 12.5m | 1% |
Sea of Thieves 2023 Edition | Wed Dec 22 2021 | 5m | 4.9m | -1% |
Stardew Valley | Sun May 15 2022 | 13m | 12.8m | -1% |
Garry's Mod | Mon Sep 20 2021 | 20m | 20.3m | 2% |
Aragami | Fri Oct 04 2019 | 320k | 312.4k | -2% |
Handshakes | Tue Feb 21 2023 | 60k | 58.5k | -2% |
Inspector Waffles | Wed Mar 23 2022 | 3.5k | 3.3k | -4% |
Yerba Mate Tycoon | Wed Jun 15 2022 | 2k | 2.1k | 4% |
Placid Plastic Duck Simulator | Wed Dec 07 2022 | 76.3k | 73.3k | -4% |
Buddy Simulator 1984 | Mon Oct 17 2022 | 75k | 78.1k | 4% |
The Planet Crafter | Fri Mar 24 2023 | 500k | 479.3k | -4% |
Hydroneer | Wed Jul 27 2022 | 500k | 478.7k | -4% |
The Wandering Village | Sat Mar 25 2023 | 224k | 214k | -4% |
Sons Of The Forest | Fri Feb 24 2023 | 2m | 1.9m | -4% |
The Witcher® 3: Wild Hunt | Wed Apr 08 2020 | 12m | 11.3m | -6% |
Production Line : Car factory simulation | Sat Aug 17 2019 | 100k | 106k | 6% |
Dwarf Fortress | Wed Jan 04 2023 | 500k | 470.8k | -6% |
Wartales | Thu Apr 27 2023 | 600k | 562.5k | -6% |
Stacklands | Sun Jul 10 2022 | 450k | 480.1k | 6% |
Darkest Dungeon® | Thu Nov 03 2016 | 1m | 1m | 7% |
Bonding Ambivalence | Sat Apr 01 2023 | 3.4k | 3.1k | -7% |
Persona 4 Golden | Wed Jun 30 2021 | 1m | 928.7k | -7% |
Train Fever | Wed Nov 25 2015 | 81k | 87.4k | 7% |
Watch Your Plastic Duck | Mon Jan 30 2023 | 1.4k | 1.5k | 7% |
Dungeons of Edera | Thu Jun 16 2022 | 38.5k | 41.6k | 7% |
My Jigsaw Adventures - Roads of Life | Tue Jan 25 2022 | 1k | 1k | 8% |
Core Keeper | Fri Jun 10 2022 | 1m | 913.9k | -9% |
Shipped | Fri Jan 20 2023 | 9.5k | 10.4k | 9% |
Avorion | Mon Jun 13 2022 | 450k | 409k | -9% |
GROSS | Thu Mar 09 2023 | 3k | 2.7k | -9% |
Freedom Planet 2 | Wed Nov 09 2022 | 16k | 14.4k | -10% |
Punch A Bunch | Fri Feb 24 2023 | 10k | 11k | 10% |
EVERSPACE™ 2 | Thu Apr 20 2023 | 276k | 248.5k | -10% |
Stationeers | Fri Jan 20 2023 | 173k | 192.1k | 10% |
Valheim | Mon Apr 25 2022 | 10m | 8.9m | -10% |
Lost Potato | Fri Oct 01 2021 | 1.3k | 1.2k | -10% |
Roll | Wed Apr 20 2022 | 22.4k | 20k | -10% |
Eggcelerate! | Sat Nov 20 2021 | 1k | 893 | -11% |
The Dungeon Beneath | Tue Mar 30 2021 | 3.6k | 4k | 11% |
Dread Hunger | Wed Apr 13 2022 | 1m | 891.3k | -11% |
Elong Plug | Thu Mar 02 2023 | 5k | 4.4k | -11% |
Big Ambitions | Sat Mar 25 2023 | 150k | 133.5k | -11% |
Out of Ammo | Fri Jan 20 2023 | 57k | 50.7k | -11% |
Deep Rock Galactic | Mon Dec 31 2018 | 500k | 444.4k | -11% |
The Pale Beyond | Sat Feb 25 2023 | 6k | 5.4k | -11% |
Octodad: Dadliest Catch | Wed Jan 30 2019 | 660k | 585.5k | -11% |
Mortal Glory | Wed Feb 09 2022 | 23.9k | 21.2k | -11% |
Please Fix The Road | Tue Jun 28 2022 | 10k | 8.8k | -12% |
Barotrauma | Fri Jun 04 2021 | 800k | 704.8k | -12% |
Noobs Want to Live | Tue Feb 21 2023 | 100k | 114.7k | 13% |
SpeedRunners | Tue Apr 05 2016 | 1m | 866k | -13% |
Sands of Salzaar | Wed Jul 27 2022 | 1m | 860.8k | -14% |
Escape Simulator | Wed May 04 2022 | 1m | 860k | -14% |
Golfing Over It with Alva Majo | Fri Jan 20 2023 | 100k | 85.9k | -14% |
Cygnus Enterprises | Sat Apr 15 2023 | 5k | 5.9k | 15% |
Warsim: The Realm of Aslona | Sun Dec 12 2021 | 30k | 35.6k | 16% |
Contraband Police | Tue Apr 04 2023 | 250k | 209.9k | -16% |
Yi Xian: The Cultivation Card Game | Sat Jan 21 2023 | 100k | 83.7k | -16% |
Pawnbarian | Wed Dec 08 2021 | 8.4k | 6.9k | -17% |
Inscryption | Wed Jan 05 2022 | 1m | 1.2m | 17% |
Supraland | Sun Jun 28 2020 | 250k | 207.3k | -17% |
Loop Hero | Thu Dec 09 2021 | 1m | 1.2m | 18% |
The Riftbreaker | Thu Oct 13 2022 | 500k | 410.5k | -18% |
Battle Royale Tycoon | Mon Nov 04 2019 | 15k | 18.2k | 18% |
Townscaper | Wed May 19 2021 | 380k | 467.9k | 19% |
ICARUS | Fri Jan 20 2023 | 1m | 818.8k | -20% |
pureya | Fri Jan 20 2023 | 17k | 21.2k | 20% |
Among the Sleep - Enhanced Edition | Tue Feb 14 2017 | 186.1k | 147.9k | -21% |
West Hunt | Fri Feb 03 2023 | 110k | 87.3k | -21% |
Crusader Kings III | Thu Mar 17 2022 | 2m | 1.5m | -21% |
PostCollapse | Tue Apr 07 2020 | 4k | 5k | 21% |
Ravenous Devils | Sun May 15 2022 | 100k | 78.7k | -21% |
Mixolumia | Tue Jun 22 2021 | 1.4k | 1.8k | 22% |
EVERSPACE™ | Thu Apr 20 2023 | 879k | 684k | -22% |
Majotori | Fri Jan 20 2023 | 35k | 45.4k | 23% |
Cthulhu Saves the World | Thu Mar 17 2022 | 671k | 514.5k | -23% |
Osiris: New Dawn | Tue Jan 17 2023 | 600k | 455.9k | -24% |
Salome's Kiss | Sun Oct 02 2022 | 1k | 1.3k | 25% |
Protolife | Thu Jun 25 2020 | 17k | 12.7k | -25% |
Furi | Sun Oct 11 2020 | 280k | 375.1k | 25% |
City Climber | Mon May 17 2021 | 20k | 14.8k | -26% |
Highway Blossoms | Thu Jun 17 2021 | 50k | 67.7k | 26% |
Son of a Witch | Mon May 09 2022 | 23k | 16.9k | -26% |
Sifu | Fri Mar 31 2023 | 50k | 68k | 27% |
Hot Heat Reset: Chapter 1 | Thu Mar 09 2023 | 2.7k | 1.9k | -28% |
LOST EMBER | Thu Jan 13 2022 | 134k | 96.3k | -28% |
A Little Golf Journey | Fri Feb 24 2023 | 1.5k | 1k | -29% |
Nebula | Fri Feb 24 2023 | 2.4k | 3.4k | 29% |
Project Heartbeat | Wed Jan 05 2022 | 3k | 4.2k | 30% |
Guns of Icarus Online | Thu Jun 26 2014 | 450k | 645.9k | 30% |
Peglin | Tue Apr 03 2018 | 80.2k | 55.3k | -31% |
Induction | Tue Mar 20 2018 | 1.2k | 1.8k | 32% |
Cultist Simulator | Mon Feb 18 2019 | 85k | 127.1k | 33% |
Winter Falling: Battle Tactics | Wed Nov 30 2022 | 3.1k | 2.1k | -33% |
Gibbous - A Cthulhu Adventure | Sat Jan 16 2021 | 40k | 26.4k | -34% |
Out of Ammo: Death Drive | Fri Jan 20 2023 | 11.2k | 7.2k | -35% |
Missing Hiker | Tue Mar 14 2023 | 100k | 64k | -36% |
Eastshade | Thu Oct 01 2020 | 127k | 81.1k | -36% |
Cauldrons of War - Barbarossa | Sat Apr 08 2023 | 8k | 4.9k | -38% |
Cosmic Star Heroine | Thu Mar 17 2022 | 58k | 35.8k | -38% |
Primordia | Sun May 01 2022 | 200k | 120.9k | -40% |
Mortal Online 2 | Wed Feb 02 2022 | 110k | 65.8k | -40% |
Knock-knock | Tue Feb 14 2017 | 94k | 55.2k | -41% |
Tinyfolks | Thu Jun 09 2022 | 10k | 17.2k | 42% |
Slime Rancher | Thu Jan 13 2022 | 5m | 2.8m | -42% |
Tilecraft | Wed Nov 30 2022 | 1.2k | 2.1k | 44% |
Bloody Rally Show | Tue Feb 23 2021 | 2.5k | 4.4k | 44% |
The Wreck | Mon Apr 17 2023 | 1k | 564 | -44% |
Hats and Hand Grenades | Wed Dec 07 2022 | 24k | 13.2k | -45% |
Larcin Lazer | Thu Feb 23 2023 | 1.2k | 2.1k | 45% |
Missing Hiker | Thu Feb 16 2023 | 10k | 18.3k | 46% |
The Companion | Mon Nov 21 2022 | 3k | 1.6k | -46% |
Cthulhu Saves Christmas | Thu Mar 17 2022 | 13k | 6.9k | -47% |
Will You Snail? | Fri Mar 18 2022 | 7.5k | 14.1k | 47% |
Cat Herder | Thu Feb 02 2023 | 1.5k | 750 | -50% |
Nightmare Reaper | Sat Mar 26 2022 | 20k | 53.1k | 62% |
Estimating revenue
However, this does not tells us the whole story. We also want to know the revenue of the game, and sales and revenue do not have to be linearly correlated. Luckily, I made a small algorithm that factors in discounts when calculating revenue based on games price history profile. This should give us the right ball-park for the vast majority of games.
Unfortunately, for some games, the situation is not that simple. It can be hard to differentiate copies sold vs given away for free, and even harder to differentiate copies sold regularly and copies sold in bundles. Further more, it's pretty much impossible to know how the game has sold on 3rd-party sites.
We can use the ratio of reviews marked as purchased on Steam / Activated with a key, look for discrepancies in review ratios and playtime data, look for patterns in public Steam libraries that may indicate a game was purchased in a bundle, and use top seller rankings to help us deduce how many copies were sold directly through steam. However, there is currently no way to know exactly how much the game has made selling on third-party sites or in bundles. We can only confirm how many copies have been sold directly on Steam.
After all, nothing can be as accurate as the numbers provided by the developers themselves. You should always sanity check all estimates before making any decisions
Some things to watch out for:
- Smaller games have less accurate estimates than the larger ones due to smaller sample size
- Free-to-play games generally have less accurate estimations
- Revenue estimates do not take into account revenue from any external sources and may not properly estimate revenue from steam bundles
- There may be bugs in our scraping or estimation algorithm, causing the estimates to be wildly off. Always sanity check any estimates
Conclusion
Overall, using all of these methods together is far more accurate than using review-based estimations only. It is important to note, however, that when conducting market research, these estimates should only be used as a supplementary information and should not be followed blindly. Obviously, there is still room for improvement here, and I will continue to work on the algorithm to improve it even more.
If you found this article helpful, check out our tool at gamalytic.com
We are not affiliated with Valve or Steam in any way