Data Analytics - U.S. Wind Turbines
Snowflake - SQL - Tableau
Introduction
In this case study I analyze a dataset on US wind turbines in order to identify how the US government can expand its future wind turbine projects. The tools that I used for analyzing this data are Snowflake, SQL, and Tableau. The main highlights are summarized through the following links:
Tableau: Case study dashboard summarizing results and findings.
Github: SQL code for data preparation that I wrote in Snowflake.
A more detailed description of the full case study is presented below.
Scenario
Through curiosity I stumbled upon a dataset on US Wind Turbines from data.world. This dataset includes historical information on locations of wind turbines in the United States, turbine manufacturer, project information, and turbine technical specifications.
I thought it was interesting to see how the US government (assuming full owner of turbine projects) can expand new projects in the future based on historical data.
For this case study I used Snowflake and SQL for the data preparation and analysis. After I link Tableau to my Snowflake database, to perform further analysis and build visualizations.
Business question
To build the case study I had to think about an interesting business question that could be answered with the help of data analytics therefore I made some assumptions to get things started. I assumed that the US government is mainly concerned with building high quality wind turbines against low cost that produce high power. Also I thought that the US government wants to consider environmental factors when deciding to launch and new project in a state, such as population density and the number of turbines already installed. Given these assumptions I formulated the following business question:
Where and with whom should the US government expand its future wind turbine projects?
Data preparation and cleaning - SQL
It should be noted that the dataset includes a snapshot of wind turbine data from January 2023, it includes around 68,000 installed wind turbines between 1980 and 2022. There is no information on installation cost or energy prices.
I used another dataset from dataworld.info to pull in US state level information about population density and size of states in order to enrich my dataset.
Joining data sources,
I started to prepare my data in Snowflake using SQL by joining the US Wind Turbine dataset with the US state dataset. For this I used a right join to include all state codes from the US Wind Turbine dataset.
Dealing with duplicate and null values
Then I checked if there were any duplicate wind turbine IDs and deleted the duplicates. I also checked the number of null values per column which I wanted to use in the final dataset.
Renaming fields and changing data types
Then, I created a new table and started to rename fields and change data types to make it more clear for me to work with the data when analyzing it.
Data analysis - SQL
After I prepared the data I started my analysis and came back to my business question: Where and with whom should the US government expand its future wind turbine projects?
Split the analysis in two parts
When looking at this question I decided to split it into two parts. First, I looked at with whom the US government can work to realize the next wind turbine projects. What does make a good partner and who may be best suited for the project. I came up with the following sub-questions:
· Which projects were most successful in the past?
o Which projects yielded the highest MW per turbine?
o Was there a difference between old and new projects?
o Was there a difference between smaller and larger projects?
· Which manufacturers were involved in these projects?
Second, I thought about where the US government can expand future wind turbine projects. What are important factors to consider when starting to build project with 200+ turbines.
· What is the number of turbines per square kilometre in a state?
· What is the number of people per square kilometre in a state?
· Do we have to think about stricter regulation in states were there are no wind turbine projects yet?
Part 1 - project and manufacturer level analysis
Then I started the first part of my analysis to look for a good partner. I started to look at the number of wind turbines per project per manufacturer and the average MW capacity that the wind turbines produced.
Difference between project age and size
Based on this I wanted to see the difference in average MW capacity for turbines between old and new projects.
Also I looked into the difference in average MW capacity for turbines between large projects (100 > turbines) and small projects (< 100).
Then I looked into which manufacturers were involved in the new and old projects.
Part 2 - state level analysis
For the second part I analyzed the number of turbines per square kilometer in a state and people per square kilometer in a state.
Then I assigned a ranking to states based on these two variables and filtered the data to only include states with over 1000 turbines.
Finally, for states with the highest ranking, I looked into which manufacturers were active in those states.
Discussion about the results
From the data analysis I began see patterns in the data. From part 1 I saw that Vestas, GE Wind, and Siemens have most experience in creating wind farms. Vestas managed to produce the most powerful wind turbines, with on average 2.47 MWs from 10608 turbines, which is 8.8% higher than GE wind which managed to produce on average 2.27 MWs from 16833 turbines. Additionally, I also considered wind turbine manufacturers that produced less turbines but managed to achieve an even higher average output per turbine, Siemens GRE with an average output of 3.16 MWs per turbine and Nordex with 3.53 MWs per turbine.
From the second part, I selected Wyoming, South Dakota, and New Mexico as most suitable states. These states in comparison to let's say New York have a much lower population density < 10 vs 139 people per sq km, and also do not have too many ongoing wind turbines projects if looking at the ranking of turbines per sq km. In addition, I also wouldn't expect problems with regulation as each of these states already have > 1000 installed wind turbines, suggesting there is some sort of regulation already in place.
Summary and visualizations - Tableau
After I analyzed the data I made visualizations in Tableau to summarize the results and patterns I saw. Despite the limitations in data such as not having cost information I was able to find some interesting trends in the data. From the below graph I see that there is stable growth in the number of newly installed wind turbines, suggesting that the US wants to continue investing in wind projects. Upon some further digging though I found that expansion of new projects is also dependent on energy price (as there was large drop in the energy price for wind energy in 2013).
If we look at innovation, there is a strong positive relationship (p-value < 0.001) between the year in which the turbine was installed and the MW capacity of a wind turbine. Suggesting that newer wind turbines produce more power and are more efficient.
Looking at the map we can see which states have the best ranking for the lowest turbine density and lowest population density. This indicates that Wyoming, New Mexico, and South Dakota are the most suitable states to build new wind turbines. Note that the blank states do not meet the minimum requirement of having at least 1000 installed wind turbines.
If we look at the bar graph below, we can see which manufacturers were present in the best ranking states. Along with my analyses in SQL and based on average power per turbine I found that Vestas is the most suitable partner to expand projects in South Dakota and Wyoming, and Siemens GRE for expansion in New Mexico.
Tableau: Case study dashboard summarizing results and findings.
Github: SQL code for data preparation that I wrote in Snowflake.