As I move from engineering into data science, there are some key aspects of data science that I’m excited about:
- Using my math/statistics training more.
- Continuing to be creative, but in a different way.
- Working in a field where technology is rapidly changing and expanding the limits of what we can do.
A building is similar to a 3D puzzle — and a building typically leaks because a puzzle piece is missing, or puzzle pieces were not correctly put together (usually several pieces). So the problem-solving typically focuses on figuring out what pieces were missing or put together incorrectly. This problem can be hard to solve (and we didn’t always solve it); however, this problem is more spatial in nature. While I enjoy spatial problems, I missed more quantitative problems.
As I’ve learned about data science, learning how the math relates to it has jogged my memory, and I’m recalling pretty much every math course I’ve ever taken — calculus, differential equations, linear algebra, statistics, probability, numerical analysis… you name it (I’m even recalling sitting in my high-school classroom while learning specific aspects of linear algebra…stuff I didn’t even know I remembered). I’ve really enjoyed seeing how each type of math is relevant in data science — how the analysis is driven by statistics and probability, parts requiring optimization use calculus, and the computer needs to think in discrete chunks, which requires linear algebra. I’m reminded how much I’ve loved math over the years and how important (and fun) it is.
In buildings, the creative side of me would come out when I had to figure out how to repair something that was wrong. For instance — if a puzzle piece was missing, I would figure out how to add a new piece to the existing puzzle that would take the place of the missing piece. However, in an existing building, this isn’t as easy as just replacing the missing piece, because the configuration of the other components usually limited our ability to modify the building. So we would need to get creative and figure out a solution that met the goal of addressing the leak while also accommodating the constraints set in place from the existing building.
In data science, we’re often given data to analyze, with a goal or question in mind; however, that doesn’t have to limit the analysis. As I analyze the data, I can continue to ask questions — and see what’s interesting. For example, on one project where I analyzed republican and democrat subreddit posts (which I’ll go into further in a future blog post) — my initial goal was to predict which subreddit a particular post was in based on the language in the post. However, in the process of analyzing the data, I found that many posts referenced URLs. I ended up spending a good amount of time and effort looking into which URLs were unique and common to each subreddit because it was interesting and I found some unexpected results (get excited for the future blog post). While this analysis took time, the ability to explore all aspects of the data that seem interesting — or at least do an initial analysis to see if it’s worth exploring further — is very freeing and some of my favorite part of the data science work that I’ve done so far.
Part of the reason that I loved studying for my master's degree so much is that I got to explore a field that is constantly changing through research and technology. Sustainability has a ton of resources being poured into it, and the types of energy models that can be done now are much more complex compared to even 5 years ago (let alone 20 years ago). In addition, there’s a growing amount of data in this field that’s increased substantially in the past few years, for instance as manufacturers become more willing to share about what materials and processes go into making their products (through environmental product declarations), we can better understand the carbon footprint of a building (through life-cycle analysis).
Data science is changing in a similar way — the creation of data from the internet and other sources is so much more than it was 10 or 20 years ago, and the computing power that gives us the ability to analyze this data is increasing at a breakneck speed. I love how this requires constant learning to stay aware of the ways in which I can analyze data —including new algorithms or methods for analysis (such as GPT-3 for natural language processing— https://www.nytimes.com/2020/11/24/science/artificial-intelligence-ai-gpt3.html or GANs (generative adversarial networks) to generate images — https://www.nytimes.com/interactive/2020/11/21/science/artificial-intelligence-fake-people-faces.html)
So, overall, as I make the switch I’m really excited — to get back to my roots in math, use my creativity in a different way, and work with a constantly changing toolset!