Have you seen the 2011 movie Moneyball ? Or read Michael Lewis’s 2004 book on which the movie is based? If so, I’d hazard a guess that you’...
Have you seen the 2011 movie Moneyball ? Or read Michael Lewis’s 2004 book on which the movie is based? If so, I’d hazard a guess that you’re either a baseball fan or a Brad Pitt fan. When most casual observers think of baseball, they might picture long afternoons in sunny stands, eating crackerjacks, stretches of inactivity followed by bursts of excitement when someone hits a home run. We think of those power hitters and ace pitchers, the ones who draw a crowd and inspire awe with their strength and athleticism. What we don’t usually think of is a high-level data analyst on the team.
Data analysts have played a key role in baseball since 2002, when a revolutionary approach by the Oakland A’s proved how powerful data analysis could be. As described in Moneyball, the story of the Oakland A’s goes somewhat like this: In the late 1990s, the A’s had a terrible record. New general manager Billy Beane wanted to turn that around, particularly after he lost three top players in 2001. He turned to a colleague named Paul DePodesta, a scout with a degree in economics from Harvard. DePodesta and Beane began combing through data to try to figure out what actually helped them win games. Was it the big hitters? The ace closer? Or was it something else that people might be overlooking?
Beane and DePodesta used a form of data analysis called Sabermetrics, created by Bill James in 1980. James defined it as “the search for objective knowledge about baseball.” This approach had been around for decades, but few people in baseball were using it; instead, scouts and managers tended to rely on what fans notice: the grand-slam hitter, the pitcher who strikes out six batters in a row the ones who grab your attention. Sabermetrics is the practice of taking a deeper look at the statistics and running predictive models to see what and who can help you actually win a game.
In their analyses, Beane and DePodesta found that oft-overlooked statistics, including on-base percentage (the percentage of time a specific batter makes it onto the bases, whether that is from a hit or a walk), were much more predictive of a team’s success than the statistics most scouts and managers cared about, which included RBIs, batting averages, and home runs. The A’s had seven first-round draft picks for the 2002 season. Beane surprised other managers and scouts by picking college athlete Nick Swisher in the first round. Indeed, all of his first-round picks went against what traditional baseball scouts advised: they weren’t the sought-after players with the highest averages or number of RBIs. They weren’t players that teams with higher budgets, like the New York Yankees, were trying to sign. The A’s worked with a small budget of just $40 million (compared to the Yankees’ $120 billion) to put together a team that data, not necessarily instinct, told them would do well.
By the end of the season, they had the winningest record in the Major Leagues and had broken the American League’s nineteen-game winning streak set by the 1947 Yankees. The A’s performance that season revolutionized baseball. Teams now use Sabermetrics to guide nearly every aspect of the game what players to sign, what kind of swing a player should practice to improve their on-base percentage, and when to bring in which pitcher based on what batters are due up. Teams now employ high-level data analysts to help managers and coaches make these kinds of decisions.
Big Data and Its Many Uses
If big data the collection and analysis of a constant stream of data points can revolutionize the game of baseball, what else can it do?
Data is ubiquitous. What you may think of as everyday life a few clicks on the internet, a grocery shopping trip, a Sunday night football game is informed by and generative of data. How you navigate the internet is informed by data, what items your local grocery store carries is informed by data, and how that football team performs (not to mention what time the game airs and what network you watch it on) is based on data. Anything we do generates information that someone with the desire and skill set can analyze. These analyses are used to make decisions that shape the world around us.
As you can imagine, the amount of data out there exploded with the growth of the internet. Every click, photo, tweet, or post is a data point. According to the nonprofit University of Massachusetts Global, “‘Big data’ essentially refers to a collection of datasets so large that it cannot be analyzed with normal statistical methods. The Bureau of Labor Statistics explains that such data can include videos, pictures, maps, words, phrases, and numbers. Customer reviews posted on a website, comments and photos logged on a social media platform, electronic medical records and bank records are all examples.”
What the University of Massachusetts Global calls “normal statistical methods” refers back to Ronald Fisher and the cup of tea. It involves designing an experiment, defining the sample, collecting data, controlling for variables and checking for reliability, and analyzing that data. Big data goes way beyond that. It involves working with data that is not the result of a well-designed, limited study, but rather the result of our everyday actions. As analysis of big data has gotten more sophisticated, so have our abilities to see patterns and make inferences based on the data that has been collected.
Data analysis is a key part of nearly every industry today. Have you ever traveled abroad and gotten a fraud alert from your bank for suspicious activity? Your bank thinks your transactions are suspicious because of big data. Your bank has thousands of data points showing a geographical pattern for your usual transactions, so it recognizes when you break the pattern. Even more than that, your bank has analyzed billions of transactions and knows how likely it is that a foreign transaction in your account is fraud. Someone who travels for work wouldn’t be getting those notifications, because their bank recognizes that foreign transactions are typical for them.
On a much smaller scale, you are a data analyst in your everyday life. If you are a parent, you know how your child normally behaves. Think of every single interaction you have, every observation you have ever made about your child, as a data point. You know what to expect from them because you have (possibly unconsciously) noticed a pattern in their behavior and made inferences. When your child starts acting differently one day, you suspect they may be sick or have something bothering them, because this new data their current behavior doesn’t fit the pattern. Maybe the new data leads you to check for a fever or take them to the doctor that day. Your unconscious data analysis has prompted you to change your behavior, to address something differently. This is exactly how big data analysis works in industries, but with a more deliberate approach.
Here’s another way big data is used today: chatbots on company websites. Remember last week when you decided to return that pair of pants you had ordered online? You visited the company’s website to start a return, and a little box popped up asking if it could help you. You may have thought, sure, but it’s 2:00 a.m. on a Tuesday; I’m not sure anybody is working who can help me right now. That little box was generated by artificial intelligence and data which you probably quickly figured out if you interacted with it. The company knows, from thousands or even millions of customer interactions, what kinds of questions you might have for them. They can then use AI for a chatbot that answers all those typical questions. This saves the company time and money since they don’t need to pay someone to answer their customers’ most basic and common questions.
The reason those chatbots can exist that the company has so much data that it can predict and easily answer customers’ questions is the same reason the chatbots can be so incredibly frustrating. Maybe the company predicts, for example, that you are going to ask how to start a return. You might ask if returns are free, how long you have to return an item, or how to generate a shipping label. These are all questions that the company has data on; customers have asked them so many times that the chatbot expects them and can be programmed to answer them. But let’s say you have a more specific question, like can you apply the forty-percent-off coupon that you used in your initial purchase to the newly listed green jacket while getting your money back for the pants you want to return. Because it’s AI and because this isn’t as common a question, the chatbot probably can’t answer this. It either tries to answer a different question, leaving you frustrated and maybe slamming your computer closed, or it directs you to call customer service during business hours.
While the results of big data may annoy consumers, most industries think of it as helpful. Data helps them know their customers better so they can target their advertising or products more accurately. In other words, it helps them increase their profit. For sports organizations, big data functions in a similar capacity: it helps them increase efficiency, which leads (hopefully) to better playing and more wins.
Big Data and the Consumer
But what about you, the consumer? Does big data help you?
The answer to this question depends on your viewpoint. You may find it helpful that ads are targeted to your interests when you log into a social media site. Maybe you have found brands you wouldn’t otherwise have known about that helped you in some way. Maybe it was easier to find the stylish pair of glasses you wanted because that ad popped up just as you were beginning to think about shopping for glasses. But some people think this incessant data collection is invasive and would rather not have companies keeping tabs on them.
There are small steps we can take to mitigate our data being collected. Most websites now ask you to “accept cookies” when you visit them. These cookies are small pieces of data about you that are stored in your computer’s browser what items you clicked on, for example, or what you put in your shopping cart. You can usually choose to “reject cookies” on a website so that your personal preferences won’t be stored. Similarly, you can block location sharing on your phone, since many apps gather information about where you are when you use them. And you can check your privacy settings on social media sites like Facebook, which allows you to customize what information is stored and what can be seen by others.
Even with these precautions, though, you generate data with every move, and there is little you can do to escape being a source besides living completely off the grid. Remember the big Facebook and Cambridge Analytica scandal of the late 2010s? Facebook shared data without its users’ permission with a big data company called Cambridge Analytica. Cambridge Analytica used the data mainly for targeted political advertisements, particularly toward those on the conservative end of the political spectrum. Eventually, Cambridge Analytica was found to have provided data on voters to the Trump campaign and was suspected to have given information to Russian political operatives who were trying to influence foreign elections. Mark Zuckerberg, founder and CEO of Facebook, had to answer to Congress about this massive data breach that involved some eighty-seven million users.
So as much as you try to avoid having your data harvested, you can only do so much if you are living in the modern world. Everything you do, every click of the mouse, every purchase with your debit card, creates data. This data then gets analyzed and turned around to influence your next move. As The New York Times quoted one Facebook user, “You are the product on the internet.”
Again, knowledge is power. If you’re aware that the ads that pop up on social media are specifically targeted to you, you can more easily resist them. Know that you are constantly generating data and become aware of how that data is used to try to influence you. Then use this knowledge to try to recognize and resist the influences.
COMMENTS