In [2]:
from PIL import Image

from io import BytesIO

from IPython.core import display
from PIL import Image

def display_pil_image(im):
   """Displayhook function for PIL Images, rendered as PNG."""

   b = BytesIO()
   im.save(b, format='png')
   data = b.getvalue()

   ip_img = display.Image(data=data, format='png', embed=True)
   return ip_img._repr_png_()


# register display func with PNG formatter:
png_formatter = get_ipython().display_formatter.formatters['image/png']
dpi = png_formatter.for_type(Image.Image, display_pil_image)

DOTA2 Analysis: Using Python to provide insight into professional DOTA2 matches


echo "data analysis" | sed "s/a/o/1"

Laurie Clark-Michalek

UCL - MEng Mathematical Computation

skadistats - Cofounder/Software engineer/Dogsbody

Skadistats

What is DOTA?

  • 5 "heroes" on each team, controlled by players
  • Teams collect resources spread around the map, while attempting to prevent the other team from doing the same
  • Larger objectives require teamwork and cooperation to take

Cooperation is not always present

minimap

busy minimap

  • Complex system
  • Should be ripe for analysis
  • Large market of players looking to improve

However...

dotabuff

Vanity metrics

The purpose of computing is insight, not numbers.

Richard Hamming

2 Data sources:

  • Exposed API
    • Clean, easy, json/xml data
    • Very limited
  • Raw match transmissions and replays
    • Horrible, binary, state everywhere, undocumented, liable to change without warning
    • All possible data

Skadi

https://github.com/skadistats/skadi

Written in pure Python, using the Google Python protocol buffer libraries

Slow

In []:
from tarrasque import *

replay = StreamBinding.from_file("./demo/PL.dem")

hero, = [player.hero for player in replay.players if
         player.name == "Savlon"]

gold_data = []
tick_data = []

for tick in replay.iter_ticks(start="pregame", step=300):
  if replay.info.game_state == "postgame":
    break

  gold_data.append(hero.total_gold)
  tick_data.append(tick)

import matplotlib.pyplot as plt

plt.plot(tick_data, gold_data)
plt.show()

# TODO: Find graph to show

Invoker

  • 3 different auras
  • Combination can be invoked
  • Provide different abilities and stats

invoker

fst

heatmap

Python

In []:
class Bitstream(object):
  # ... constructors etc omitted ...
  
  def read_varint(self):
    run, value = 0, 0

    while True:
      bits = self.read(8)
      value |= (bits & 0x7f) << run
      run += 7

      if not (bits >> 7) or run == 35:
        break

    return value

Cython

In []:
cdef class Bitstream:
  # ... constructors etc omitted ...

  cdef uint64_t _read_varint(Bitstream self):
    cdef uint64_t run, value
    run = value = 0

    cdef uint64_t bits
    while True:
      bits = self.read(8)
      value |= (bits & 0x7f) << run
      run += 7

      if not (bits >> 7) or run == 35:
        break

    return value

Idealised Cython Output

In []:
uint64_t Bitsteam__read_varint(Bitstream self) {
    uint64_t run = 0, value = 0;
    uint64_t bits;
    while (1) {
        bits = Bitstream_read(self, 8);
        value |= (bits & 0x7f) << run;
        run += 7;

        if (!(bits >> 7) || (run == 35)) {
           break;
        }
    }
    return value;
}

Actual Cython Output

In []:
 /* "test.pyx":12
 *     while True:
 *       bits = self.read(8)             # <<<<<<<<<<<<<<
 *       value |= (bits & 0x7f) << run
 */
    __pyx_t_1 = __Pyx_PyObject_GetAttrStr(((PyObject *)__pyx_v_self), __pyx_n_s_read); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 12; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_GOTREF(__pyx_t_1);
    __pyx_t_2 = __Pyx_PyObject_Call(__pyx_t_1, __pyx_tuple_, NULL); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 12; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_GOTREF(__pyx_t_2);
    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
    __pyx_t_3 = __Pyx_PyInt_As_uint64_t(__pyx_t_2); if (unlikely((__pyx_t_3 == (uint64_t)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 12; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    __pyx_v_bits = __pyx_t_3;
    

 /* "test.pyx":13
 *       bits = self.read(8)
 *       value |= (bits & 0x7f) << run             # <<<<<<<<<<<<<<
 *       run += 7
 */
    __pyx_v_value = (__pyx_v_value | ((__pyx_v_bits & 0x7f) << __pyx_v_run));

Actual Cython Output mk2

In []:
/* "test.pyx":12
 *     while True:
 *       bits = self.read(8)             # <<<<<<<<<<<<<<
 *       value |= (bits & 0x7f) << run
 */
    __pyx_v_bits = ((struct __pyx_vtabstruct_4test_Bitstream *)__pyx_v_self->__pyx_vtab)->read(__pyx_v_self, 8);

    /* "test.pyx":13
 *       bits = self.read(8)
 *       value |= (bits & 0x7f) << run             # <<<<<<<<<<<<<<
 *       run += 7
 */
    __pyx_v_value = (__pyx_v_value | ((__pyx_v_bits & 0x7f) << __pyx_v_run));

Cython is the anti-glue language

  • Calling into random things is to be avoided
  • Everything else is fast
  • Hardware loves bits
  • C loves bytes
  • Python loves being slow

Where to use Cython?

profile

Gprof2Dot

In []:
python -m cProfile -o output.pstats path/to/your/script arg1 arg2
gprof2dot.py -f pstats output.pstats | dot -Tpng -o output.png
  • Things at the bottom of the call stack
  • Things that are called a lot
  • Things that have clean abstractions
  • Things that are self contained

Fuzzing

In []:
from skadi import CythonBitstream, PythonBitstream

class TestCythonVarIntRead(unittest.TestCase):
    ITER_N = 2 ** 16
    
    def generate_random_streams(self):
        bytes_n = random.randint(8, 8*8 - 1)
        input_bytes = bytearray()
        for _ in range(bytes_n):
            input_bytes.append(random.randint(0, 2 ** 8 - 1))
        return CythonBitstream(input_bytes), PythonBitstream(input_bytes)
    
    def test_read_varint(self):
        for _ in range(ITER_N):
            cython, python = self.generate_random_streams()
            assert cython.read_varint() == python.read_varint()

dreamhack

When do teams group up?

group

trilane

"Obvious" method:

  1. Map over match
    1. Take each hero's position
    2. Apply k-means, get degree/wcss
  2. Reduce resulting time series to some point of maximal "group up"ness
  3. Create composite graph of these point
  4. Voila

Problems:

  1. No experience outside of numpy/scipy
  2. Each iteration must be fast; matches contain a ton of "ticks"
  3. Would like to be able to explain to players precisely how data was created
    • Saying "k-means" doesn't mean much to them

split

split

A more literal approach

  1. Map over match
    1. Take each hero's position
    2. For any combination of 4 positions
      1. Find radius of minimum covering circle
    3. Store minimum of calculated radiuses
  2. Find local minima of stored minima
  3. Create composite graph of these points
  4. Voila

Finding the minimum covering circle

  • Naive method quadratic
  • Linear time possible

(Due to constant number of arguments, really constant time)

min(map(min_covering_radius, itertools.combinatons(points, 4)))

graph

Confessions

We use:

  • Java
  • Scala
  • Haskell
  • NodeJS
  • Go
  • Ruby
  • Python

one

two

three

Links

skadistats

http://skadistats.com/

Skadi

https://github.com/skadistats/skadi

Other DOTA2 replay parsers

https://github.com/skadistats

General replay parsing IRC

irc.quakenet.org #dota2replay

Gprof2Dot

https://code.google.com/p/jrfonseca/wiki/Gprof2Dot

Me

[email protected] [email protected]