fealt/shark-attacks-australia
🦈 'Where in Australia to build a "shark-free" family resort?' – Project 1 @ironhack Data Analytics Bootcamp
Data Analytics Bootcamp – Project 1
Where in Australia to build a 'shark-free' family resort?
image source
⏰ Time's short?
👉 Click here to check my presentation, hosted by GitHub Pages ♡
Shark Attacks – data cleaning and manipulation with Pandas – is my first project at Ironhack's Data Analytics Bootcamp (2021). The given dataset was extremly messy and dirty, so the main pythonic challange here was to have it clean and usable. But before starting to transform beast into beauty I was also challanged to develop a story based on a business question to answer. So – Where in Australia to build a 'shark-free' family resort? – sounds familiar to my background in the construction industry and is also a huge coding job: lots to clean, search for supporting datasets and, of course, fun! 👨🏻💻
Project main objectives
▫️ Use storytelling with data to answer a 'business' question.
▫️ Apply different cleaning and manipulation techniques to make a messy dataset usable.
Client
▫️ Shark-free Hotels & Resorts is a 'worldwide to be' hotel chain, since it's missing a branch
in Australia.
▫️ To date, all of its other houses are built in "safe" beaches – with no sight of sharks.
▫️ Main clientèle – all kinds of families, with/without kids.
Cleaning & Co.
▫️ Cleaned columns include:
`year`
`type (provoked/unprovoked)`
`fatal (y/n)`
`area`
`location`
`sex`
`age`
▫️ Developed a cleaning strategy for column `location` to get coordinates, applying `GeoPy`.
▫️ To support the analysis, 5 extra datasets were used:
1. Hotels in Australia, key findings:
- Top 3 States by number of hotels are New South Wales, Queensland and Victoria.
- Accomodation rate mean for Australia around 65% and almost all states follow the mean.
2. Short-term visitors in Australia, key findings:
- Australia celebrates an incredible growth in short-term visitors over the last 40 years.
- Over 200% rise from 1990 to 1997 and almost 170% rise from 2010 to 2018.
- Turist growth in relation to total shark attacks, refer to the Jupyter notebook file for
more details.
3. Australia cities database.
4. Top 20 beaches in Australia (self-made dataset).
5. List of beaches in Australia (self-made dataset).
Analysis, Worldwide
▫️ Based on 230 years of available data, Australia is the second country in the world with most shark attacks (1338); behind USA (2229) and before Mexico (579).
▫️ Top 3 countries come up with 65% of all incidents.
▫️ 22% end up deadly.
▫️ Almost 90% of attacked individuals are male.
image source
Analysis, Australia
▫️ In 85% of all fatalities, it was possible to locate the coordinates.
▫️ Number of tourists exploded in the last 40 years: over 200% rise from 1990 to 1997 and almost 170% from 2010 to 2018.
▫️ So did shark attacks, top 10 years included: 2016, 2015, 2014, 2009, 2012, 2017 and 2018.
▫️ For a "small" State, Victoria has almost 20% of total Hotels in Australia. And the best room occupancy rate among them all, over 70%.
▫️ Among the top 3 States, only in Victoria there haven't been any deadly shark attacks in the past 40 years.
▫️ Lake Tyers Beach is a top 20 beach in Australia!
image source
Conclusion
▫️ Despite a full coast with shark attacks, there is a `shark free` area in Victoria State.
▫️ It's named `Lake Tyers Beach` and is also a top 20 beach in Australia! Ranked #16.
▫️ Therefore a `safe` place for `Shark-free Hotels & Resorts` to start hosting in Australia.
Deliverable files in this repository
- Cleaned final dataset (./assets):
shark_au_df.csv - Data analysis in a Jupyter notebook:
project_01_shark_attack.ipynb - Formal presentation – Storytelling with data –, done in
HTML5CSS3JavaScript: check it here
Data
-
Given dataset
- Global Shark Attacks: @kaggle.com.
-
Extra datasets used
- Number of movements Short-term Visitors arriving in Australia: Australian Bureau of Statistics.
- Information on the supply of, and demand for, tourist accommodation facilities in Australia: Australian Bureau of Statistics.
- Australia cities database: @kaggle.com.
-
Created datasets based on
Tech
- Python @ Jupyter Notebook
- Pandas / Numpy
- Geopy / Nominatium (Python client for geocoding)
- Viz: seabron / plotly


