⬅ Back to Analytics Home

Frozen Four Model Predictions Discussion

Overview

Since I made the March Madness bracket model predictor the previous week, I wanted to make a new one for the hockey tournament which started the following week. Since I knew a lot more about hockey stats, there would be a different set of challenges I know what stats are good and useful, but most are unaccesable, at least without paying a company like instat or something. I did know about College Hockey News and that they had a solid dataset of team statistics.

Hockey also poses a different challenge in modelling that the teams are so tight to each other and one game samples can be skewed massively by 1 incredibly inconsistent position (goaltending) that even the best possible models will get games wrong occasionally. The fact that only the top 16 teams make the tournament means that there are basically no "free" games. Every game is a ranked matchup where on any given day one team can outperform another, that you would almost never see in a 16-1 matchup in basketball. And that's before you add in the additional fact that one 18-24 year old goalie can completely steal any given game, even if the kid wasn't even his teams starter going into the season, he could still have the game of his life one Saturday and carry a 4 seed over a 1.

With that being said, I still set out to make a model to predict how the tournament would play out. I used a similar method to my basketball model, where I gave every team a "base" odds equal to the proportion of teams at their seed to have advanced past that round in the past. Then I looked at various stats, goals for and goals against per game, CF and GF percentages (both overall and at even strength), their starting Gs sv% and their top 5 point scorers. I also looked at, but didn't include a few stats. Shooting% because I know teams that rely on higher shooting percentages usually have higher volatility. A "tempo" stat I created, similar to the basketball stat that is just number of possesions in a game I think, except it was total shot attempts per game. My logic for looking at it was similar, that high tempo teams have more volatility and could get upset, while slower paced teams only need to get lucky once early then can control play for the rest of the game, which could lead to upsets. The last stat I looked at but didn't use was "PP Dependency" basically just how many of your goals were scored on the PP, because people love saying that "refs swallow their whistles" in the playoffs, but that isn't backed up by the numbers in the NHL, and knowing what I know about the selection process for NCAA/USHL refs vs that for NHL refs, I can't imagine NCAA refs would swallow their whistles in the playoffs either.

I did also seperately look at the top point scorer on each team, because I have an untested theory that the lower level you go in hockey, the more of a strong link sport it is. In the NHL often times a bad 3C or goaltending performance can sink a team on any given night, because generally all of the lines are rather evenly matched, no one line can just outscore all of the others. But the lower levels you go, the greater room for difference between top players there is. I know when I played HS hockey, we always got torched by the team that had 1 future NCAA player, even though the rest of our lines were somewhat even, just because nobody could stop him. Similarly, in the NCAA, if a team has a star Hobey candidate, Jack Eichel, Macklin Celebrini, Johnny Gaudreau type, and the other team doesn't, even if all of the other lines are even, or slightly worse, that one player could be enough to make up the difference.

Results

The odds of each team advancing to a given round/winning it all are in the chart below:

Model Predictions for each team by round

If my basketball model results were mixed, the results from this were just downright bad. The RMSE in the first round was over 0.5, so the model would have been better flipped, and the overall RMSE was 0.41, which is not particularly good. I also had neither of the two heavy favorites for the championship make the final four, and Michigan State got upset by Cornell Rd. 1. I did have eventual Champions WMU, and the team they knocked out in the frozen four, Denver with the 4th and 5th highest odds of winning the championship. However the other two frozen four teams were Penn State (last) and BU (5th from bottom), who also both had a less than 1% chance at winning in my model, so not ideal.

While my model was certainly not successful in predicting the outcome of the tournament, I can say that predicting the hockey bracket is incredibly challenging. With all of the randomness and how tight most of the matchups are, I'm not incredibly dissapointed in the lack of success in my model.

Data

As I mentioned earlier, college hockey data is hard to come by, so I used College Hockey News to get my team stats, and elite prospects for any individual player stats, for both skaters and goalies.