Big data mining of TF2 logs

Created 19th February 2019 @ 11:55

Add A Reply Pages: 1


I’m working on a master’s in data science, and I have a course project on distributed data mining. We have the full history of logs from to use for our analysis. What questions does the TF2 community have that big data mining could solve?

Here are some ideas we have so far:
-Finding cheaters with outlier analysis
-Heatmap of deaths on maps
-Metastatistics (how likely you are to win the round if you win the midfight, how likely you are to hold last with uber disad, etc)
-Tracking player improvement over time
-Future match prediction

These are just some beginning ideas, we would likely focus on only one or two. These are our quick ideas, but we’re very interested in other ideas. Let us know what else you think of!

Here’s some more technical stuff about the assignment if you’re interested-
Essentially we’re looking to take big data and gain insights from it using really powerful machines (a TB of RAM, 128 threads, 20 GPUs, etc). We’re not allowed to use neural networks (we’re taking that class at the same time), but all other forms of machine learning and data mining are allowed. If you have any ideas for machine learning or data mining methods that you think would be good for this data, hit us with those too!

And of course, huge shoutout to Zoob from and his incredible work on that site



The heatmaps idea is nice but I’m pretty sure this is a thing already to some extent. I remember watching several casts of 5cp maps with map overlays that would show where most of the frags/deaths happened during the analysis after the match. Of course, that’s one match so if you’d have more data you could make a more general representation.

Most of these ideas really depend on the context of games and how accurate Logs.TF processes info though. Spy in 6s and HL are completely different, the way people play on TF2Center will be different from how teams approach maps on a Prem level, some maps are played in different gamemodes (Product in HL vs Product in 6s) etc. I’m not sure how meaningful your results will be knowing that, or how you would determine a player’s improvement. Would it be based on class, frags, wins…? How are you going to determine whether a player is just good or a cheater based on sheer numbers? How are you going to account for outliers that Logs.TF processed incorrectly? How are you going to tell apart logs that are messed up due to it counting pregame compared to logs that are totally fine? How are you going to account for offclasses in 6s?

Add A Reply Pages: 1