Welcome to Engineered, where I take an inconsequential problem and do my very best to overthink the hell out of it using some combination of math, Excel, and coding. Today's topic: golf, or more specifically, is your favorite club (the driver) hurting you more than helping you? I don't play a lot of golf these days because it's a time consuming sport, and it didn't leave me enough time to follow my true passion: statistical modeling. But I did play on my high school golf team so I kinda know what I'm talking about here...
A philosophy professor I had in college once told a story about how he took a Russian girl on a date one time and they went to play golf. She remarked that it was like a game they had in Russia where you hit a ball through the woods (I tried and couldn't find what game this was...any Russian readers please advise). He remarked that this is a shining example of why capitalism succeeded and communism failed, because the Western world took the same concept and turned it into a $84 Billion industry. And perhaps no one item is more responsible for that amount than the driver.
Proportionally to other clubs, the driver is the most expensive. You can pay upwards of $500 for a good driver, whereas a good iron set (which contains 8 clubs) can cost about $1,600, or $200 per club. Putters and fairway woods can go for almost $400, but rarely would they eclipse what you would pay for a driver. If you break that down further into cost per use, the driver would definitely be the most (although it's tough to gauge how often you use a particular iron). The driver will be used, at most, 14 times per round, which is perhaps more than a fairway wood, but shorter clubs will undoubtedly get a lot more action in terms of number of strokes.
So even though the driver isn't used very often, people are still willing to pay big money for it. The most obvious reason is that it's the most fun club to hit, and really striping a drive can send all sorts of endorphins through your brain...causing you to forget about all the times when you hit an egregious slice that gives you the chance to reconnect with nature to find your ball.
Which led me to the question: is the driver really worth it? Sure, a nice drive can leave you in good position to make a good score on a hole, but a bad drive can really derail a hole...and a round. Sure, PGA tour players will frequently use the driver (although in some cases they will tee of with other clubs as the hole dictates)…but you probably aren't a PGA tour player, and if there is one thing I've learned in life, it's that sometimes it isn't wise to mimic the pros (if you aren't one).
I had been meaning to learn Python for a while, and I thought this would make a fun little programming project, since I knew that the scope of this would be far beyond the capabilities of Excel (though I was certainly tempted to try). My goal was to make a golfing model that would input certain parameters, like the accuracy for each club, the hole distance, and any hazards, and predict what the score for a round would be.
How the Model Works
The model, at its basest level, tracks the position of the ball as it goes through the hole, and asks questions along the way, like...
- Is the ball in the rough?
- Did it go out of bounds?
- How far away from the hole is it, which club should I use?
- Is it close-enough to the green that I don't need to take a full swing of a club?
- Is it on the green?
- Did we get it in the hole?
Based on the answers to these questions, the model will choose a club to use, which has a certain range associated with it. It will then aim towards the hole, and then move the ball the distance that the club should be capable of...or close to it. It will also apply a random error factor to both the range and the angle of the shot, based on which club is used. This process is repeated until the ball is "in the hole", meaning that the geometric position of the ball in the 2D coordinate space is within the hole radius. It will repeat the process for the entire course (18 holes, consisting of the usual 4 par threes, 10 par fours, and 4 par fives) and play the course a total of 1,000 times. This is known as a 1,000-element Monte Carlo analysis, which is used frequently in modeling whenever random processes are present as a way to tell, on average, what the result will be.
For my purposes, I didn't even attempt to model things like launch angle, spin, or wind, it all just gets lumped into the angle and range error of the club, as all I really care about is where the ball ends up, I don't care how it gets there.
The random error factors mentioned above, which apply to both the range and the horizontal angle of the shot, are assumed to follow a Normal Distribution, or "bell-curve". Whether or not actual golf shots follow a Normal Distribution is unknown to me, but it seemed like a reasonable assumption. If anyone wants to go on the range and hit about 30 balls with each of your clubs, and then carefully note the final position of every shot (not just the good ones), then that would be pretty valuable data to have.
Modeling the Course
The course, as previously mentioned, consists of the typical par 72 format, with 4 par threes, 10 par fours, and 4 par fives. For the purposes of my model, each golf hole is identical except for the length, which is determined based on what par it is. The holes themselves are a very idealized version of a golf hole, with a fairway, 2 cuts of rough, woods, and then out of bounds. The holes are perfectly straight, and the greens are perfectly circular, with the hole location right in the middle of the green (there's a reason I'm not a golf course architect).
Part of the reason for not including any hazards, dog legs, or weirdly shaped greens with different pin locations is a) it would have been harder, and b) it would have introduced more strategy into the game, which would have made the model much more complicated. My goal for this study was to see how inaccuracies with shooting affected score, so having hole arrangements such as this make it so that the only strategy is to simply aim the shot at the hole and try to get as close as possible.
Parameters of the Model
The model has many inputs that affect what the final score will be. For the golf course, the inputs include:
- Course length
- Green radius
- Rough boundaries (locations in the x-coordinate that determine where the rough is)
- Rough penalty parameters (for each different cut of rough, a certain penalty is applied to the shot range and angle accuracy. The cut of rough also determines the longest club you can use (I figured you probably aren't going to use a 3-wood if you're in the woods)
When choosing the course length, the hole distances are then subdivided based on some nominal ranges for the different pars. There is then a random offset applied to the front 9 holes such that they vary a little bit each time the model is run, and then the inverse of that offset is applied to the back 9 so that the course is exactly the length you want it to be.
For the golfer profile, the parameters are:
- Nominal range for each club
- Range accuracy standard deviation for each club
- Angle accuracy standard deviation for each club
What Were the Expectations?
Going in, my hypothesis for what would happen would be that for a pretty accurate player, hitting the driver would more than likely be a benefit, since it would be less likely to hit drives into the woods or OB, which garner high stroke penalties. I did figure, though, that the standard deviation of the score would be reduced if the driver was not used, which would both reduce the chance of high scores but also reduce the chance of low scores. For a less-accurate player (such as an average amateur), I thought there might be some advantage to ditching the driver, since golf is a game where you can only do so much better than par, but you can sure do a lot worse than par, and for amateur players I figured avoiding big numbers on holes would be more beneficial than trying to get low numbers on holes.
Calibrating the Model
Once I was pretty sure the code was working, it just became a matter of messing with the various input parameters in order to get results that were reasonable and realistic. I decided to do this study for 2 basic cases: the average professional and the average amateur.
For the average professional, the PGA Tour website has a lot of great stats that I used, based mostly on results from the 2018 season (I wanted this to be pre-Covid). Here are the numbers I used to calibrate the model (for a professional):
- Average Score: ~71
- Average Number of Putts per Round: ~29
- Average Rate of Greens in Regulation: ~66%
- Average Rate of Fairways Hit: ~66%
- Average Driving Distance: ~290 yards
- Range of each club: based on this article for a "long" hitter, with each range multiplied by 1.109 to get the driving distance equal to the average of 290 above
Now, these numbers are very good, but they are made even more impressive by the fact that the pros are playing on the hardest courses in the world. Here are the parameters I used for the "pro" course:
- Course distance: 7200 yards
- Green radius: 12 yards (based on this number for the area of the greens at Pebble Beach, then back-calculating the radius of an equivalent circle, then adding a bit since Pebble Beach has the smallest greens on tour)
- Fairway width: 32 yards
- Rough: while I was able to find "typical" numbers about the fairway width, I think courses vary too much to have any sort of average on how wide the rough is. For the purposes of the model, it's equally important how much the rough penalizes a player, which is also info that isn't readily available (and would probably vary based on the course). Here are the numbers I used:
- 1st Cut: Club range at 95% of original, range and angle error increase 2%. Longest club that can be used: 5 Wood
- 2nd Cut: Club range at 85% of original, range and angle error increase 10%. Longest club that can be used: 3 Iron
- Woods: Club range at 60% of original, range and angle error increase 40%. Longest club that can be used: 5 Iron
Finally, for the club accuracies, I made the bold assumption that the angle error for each club is the same, and the range accuracy is the same percentage of the club range. At first, I thought for sure that the longer clubs would have more angle error than the shorter ones, but after reading this article it seems that may not be the case (although I'm still skeptical). The parameters I used are as follows:
- Club range error standard deviation: 3.5% of the club range
- Club angle error standard deviation: 3.3 degrees
- Putter range error standard deviation: 0.08 yards
- Patter angle error standard deviation: 0.08 degrees
The above numbers seemed almost ridiculously precise, but believe it or not that was what it took to get pro-level stats.
The Results Case 1: The Professional
Using the above numbers as inputs to the model, we get the following distribution when using the driver as one normally would:
The graph on the top is the histogram of scores after 1,000 trials. I was able to get the numbers pretty close with the aforementioned parameters, as the average score is right at 71, the number of putts is pretty close to 29, and the greens and fairways are almost 66%. This is also predicting a best score of 60, and as it turns out, Brandt Snedeker shot a 59 in 2018, so that was even pretty close to the model. I'm not sure about the worst scores on tour, I guess they don't want to kick those guys while they're down...but it seems reasonable that somebody had a really bad day and shot an 86.
The graph on the bottom shows the distance to the hole after the tee shot and before the ball lands on the green. Interestingly, it forms 3-4 distinct distributions, depending on how you look at it. There is an obvious one from 100-200 yards, then 200-250 and 250-300, although the latter are less common. Of course, there is a large spike close to the green, likely due to approach shots that miss the green by a touch. What's odd is the lack of shots in the 50-100 yard range, which works out well considering that's sort of a "no man's land" for club selection.
And now let's see what happens if an average pro stopped using the driver...
Perhaps as expected, the average score gets almost 2 strokes worse. The fairway hit rate does increase several percentage points, but the greens in regulation rate reduces by about the same amount. A little surprisingly, the standard deviation stays about the same. The shot distance to hole average increased almost 14 yards, which shows a possible issue with the "no driver" strategy, which is that you'll likely end up with longer approach shots...and those will be less accurate on average.
So, if you're a pro, it's probably best to keep using the driver (for straight holes at least)…but what if you're not a pro?
The Results Case 2: The Long-Hitting Amateur
Stats for an "average" amateur were much harder to come by, and the stats I ended up going with are probably not the "average" amateur but more than likely the average amateur that plays a fair amount and takes their game fairly seriously. I'm sure if we had accurate stats for every Joe Sixpack that hit the links the numbers would be...horrific.
For the case of an amateur that can hit the ball almost as long as the pros, I used the following parameters for the golfer profile:
- Average Score: ~90 (USGA stats suggest the average handicap for men is 14.2, but keep in mind that the types of players willing to track their handicap are likely better than average. This article suggested the average amateur shoots a 90, which seemed more reasonable)
- Average Number of Putts per Round: ~36 according to this article
- Average Rate of Greens in Regulation: unknown, but worse than 66% (the pro rate)
- Average Rate of Fairways Hit: unknown, but probably worse than 66% (the pro rate)
- Average Driving Distance: ~260 yards
- Range of each club: based on this article for a "long" hitter
I then messed with the angle and rate errors for the clubs until I got the number of putts per round and the score to be in-line with the numbers above:
- Club range error standard deviation: 6.65% of the club range
- Club angle error standard deviation: 6.27 degrees
- Putter range error standard deviation: 0.16 yards
- Patter angle error standard deviation: 0.16 degrees
The golf course parameters also needed to change to be in-line with an amateur course:
- Course distance: 6600 yards
- Green radius: 16 yards
- Fairway width: 40 yards
- Rough: (I kept the same penalties, I just changed where the rough boundaries were)
- 1st Cut: Club range at 95% of original, range and angle error increase 2%. Longest club that can be used: 5 Wood
- 2nd Cut: Club range at 85% of original, range and angle error increase 10%. Longest club that can be used: 3 Iron
- Woods: Club range at 60% of original, range and angle error increase 40%. Longest club that can be used: 5 Iron
With all these parameters, here's what we get:
It's a much wider spread than the pro player, which is expected. For the iron distance to the hole, instead of a few distinct distributions there is more just one blob of shots in the 50-250 yard range, so the long-hitting amateur will get his money's worth out of all his clubs (and will certainly hit a lot of wedge shots).
Let's see what happens if we take away the driver...
What's interesting to me about this is how much it didn't really change. The score, for all practical purposes, is the same (running the model several times with the same parameters will get slight variances within a stroke). The standard deviation did reduce a little, but only by 0.3 strokes, so still within the margin of error. The worst score stayed about the same and the best score got a tad worse. In any case, these differences wouldn't likely be noticed. The fairway rate got a little better, but the green in regulation rate got a little worse (but not by much).
The shot distance to the hole changed slightly, as there is a more prominent distribution centered at 200 yards, as opposed to being centered around 160 yards with the driver. This is perhaps the issue with not using the driver, as it will more than likely leave you with either a long iron or a fairway wood into the green on some holes, which is not ideal.
I did the same model for "medium" hitting amateurs (with a 230 yard nominal drive) and the results actually favored the driver slightly. This is perhaps expected, since a shorter-hitting player will, in theory, not have as much capability to hit extremely wayward tee-shots, and that would make the driver more of a benefit than a burden.
So, Should You Use the Driver?
Based on this model, if you're an amateur then it probably doesn't matter. If you like hitting the driver and don't find yourself in the woods too often, then go for it. If anything, this model proves the adage "drive for show, putt for dough", since improving your short game accuracy would pay much bigger dividends than improving driving distance. If you're a pro then you probably need to be able to hit the driver well since pro courses are longer (due in part to a certain feline-named payer who, in a sense, broke the game of golf in the early 2000s with his long-hitting).
Just an aside for that last statement, it's a well known fact that courses got their feelings hurt when Tiger Woods and other long-hitters started leveraging that ability by cutting corners on dog-legs to drive the greens on par fours. So they started trying to "Tiger-proof" courses by making them longer...which in my opinion was absolute worst way to do it. If you have a player who can hit it longer than the others, you are just playing into his hand by making the course longer. If you want to bring them down, then you should do some combination of making the course shorter and more treacherous. If you make the course shorter, then other players can now drive the greens too, which levels the playing field. And if you make it more treacherous (have more hazards, smaller greens, etc.) then you really amplify the risk-reward proposition. Sure, they could drive the green, but in trying they may be risking a lot. So there you go, I just solved professional golf...you're welcome. Of course, what do the tournaments care if Tiger or some other long-hitter wins every tournament?
How to "Engineer" Your Golf Game
As a final note, I'd like to talk a bit about using some of the concepts mentioned in this article to make you a better golfer. I talked a lot about range and angle standard deviation, which were important inputs to my model to determine the final score. I would bet that 99.99% of active golfers don't know what these numbers are for their clubs, and knowing them could indeed help your strategy on the course.
If, for example, you knew that your nominal drive range was 250 yards, with a range standard deviation of 10 yards and an angle standard deviation of 5 degrees, then about 68% of the time your drives would be within 22 yards of the the centerline, and 95% of the time your drives would be within 44 yards of the centerline. If the course you were playing didn't have any big hazards within 44 yards of the center, then you'd do pretty well (except for that pesky 5% of the time). These probabilities are all based on figures from a Normal Distribution, shown below with a mean of zero and a standard deviation of 1:
Now, if you knew a certain hole had big hazards just outside the fairway, and the fairway is 44 yards wide, then you'd have a decision to make. If you pulled the driver, then about 68% of the time you'd land in the fairway, which means that about 32% of the time you wouldn't...so you'd be flirting with a 1 in 3 chance of being in the hazard. If that doesn't sound good to you, then maybe use a shorter club that could make it less of a chance that you'd be in trouble.
Decisions like the one above are something most golfers do intuitively, in the moment, but it wouldn't be a bad idea to plan out your round the night before. If you have a yardage book for the course you're playing, then you can sit down and calculate all the probabilities of being in trouble if you used certain clubs in certain situations (mainly tee-shots), and you could, to some degree, predict your average score before you even set foot on the course. If you want to get really smart with it, then you can calculate risk-reward probabilities based on if you hit it in the fairway, in the rough, in a hazard, etc. That might be a good idea for another model I could make in the future...
So What Did We Learn from All of This?
Well, since my model shows that you're probably no worse off using the driver, I guess we didn't learn a whole lot. But then again, models are imperfect and my input parameters were likely not realistic, so it's still possible that, for some golfers and in some situations, the driver may not be the best club to grab on a tee-shot. I'd say probably the biggest takeaway from all of this is that you should start keeping statistics for all of your shots, so that you can compile them and use them to better strategize on the course.
At the end of the day, though, most people play golf for fun, and even if they could shave a few strokes off their game by being more strategic with tee-shots...that might just rob them of one of the simple joys of ripping the 'ole thunderstick. Sure, being slightly closer to the hole doesn't matter if you are awful at hitting approach shots, but it sure does feel nice to tell your friend "hey, did you hear about that new Wal-Mart they are building between my ball and yours?"