Data science is the field of study that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data.
An article by the Harvard Business Review in 2012 labeled Data Science “The Sexiest Job of the 21st Century”. This brings up the question - why all of a sudden is Data Science so “sexy”.
Why “Data Science”?
The amount of data and information being created, captured, copied and consumed since the early 21st century has grown at an exponential rate. It is expected by 2025, around 175 zettabytes of data will be created, from 2 zettabytes in 2010.
If you attempted to download 175 zettabytes at the average current internet connection speed, it would take you 1.8 billion years to download. Even if you enlisted every person in the world to help with the download, it would still take 81 days.
If you were to store 175 zettabytes on DVDs, your stack of DVDs would be long enough to circle Earth 222 times.
That is a lot of data!
Just imagine how many photos, videos and other content gets created every second. In 2018, in one minute:
- Twitter users sent 473,400 tweets
- Snapchat users shared 2 million photos
- Instagram users posted 49,380 pictures
- LinkedIn gained 120 new users
Furthermore:
- Google processes more than 40,000 searches every second, or 3.5 billion searches a day.
- 1.5 billion people are active on Facebook every day. That’s one-fifth of the world’s population.
- Two-thirds of the world’s population now own a mobile phone.
- 1.7MB of data has been created every second by every person in 2020.
- In the last two years alone, an astonishing 90% of the world’s data has been created.
With zettabytes of data sitting around and the continuing exponential growth of data, this has led to a shortage of individuals that can analyse this data. Data can be hidden gold for a business or governments etc., and provides useful insights and evidence in making important decisions.
Data skills will only become more essential over time. That is why it is the profession of the future.
What is “Data Science”?
There are a number of different areas or disciplines within Data Science.
However, broadly Data Science can be considered as an intersection between:
- Computer Science
- Math & Statistics
- Domain Knowledge (knowledge about the industry the data or information comes from)
We say “broadly” as nowadays Data Scientists cover more skills than the above three. In reality, there is much more to Data Science.
Here are some technical skills and know-how required by Data Scientists:
- Data Mining
- Programming
- Statistics
- Visualizations
- Databases
- Data Engineering
- Big Data
- Data Processes
- Machine Learning
- Pattern Recognition
- Computer Vision
- Experimental Design
Furthermore, an important aspect for Data Scientists are soft skills, in order to convey their insights about the work they have done to others:
- Communicating
- Presenting
- Domain Knowledge
Data Science and Programming
Programming, one of the most important aspects of Data Science, is linked with many of the technical skills listed above, and allows tackling data challenges, questions, and tasks in a quick, easy and automated way.
Python has been around for many years and is a Data Scientist’s main weapon for tackling complicated data analysis. One of the main reasons is the large number of community-driven packages that are built for data specific tasks in Python.
What is a library? A library is a collection of saved code that someone else has written for you. You can import various bits of code from a library to complete a specific task, so that you don’t have to write everything from scratch.
Data Science vs. Data Analytics?
Data analytics involves answering questions generated for better business decision making. It uses existing information to uncover actionable data. Data analytics focuses on specific areas with specific goals.
Data science is the field of study that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data. It focuses on discovering new questions that you may not have known needed to be answered to drive innovation.
The Data Science Toolbox
Some typical tools and technologies a Data Scientist should know are:
- Programming languages such as Python, R, SQL, Java, Julia, and Scala
- For statistics, mathematics, algorithms, modeling, and data visualization, Data Scientists usually use pre-existing packages and libraries including: Scikit-learn, TensorFlow, PyTorch, Pandas, Numpy, and Matplotlib.
- For reproducible research and reporting, Data Scientists commonly use notebooks and frameworks such as Jupyter and JupyterLab.
- To access and query many of the top RDBMS, NoSQL, and NewSQL database management systems: MySQL, PostgreSQL, Redshift, Snowflake, MongoDB, Redis, Hadoop, and HBase
- Cloud service providers: Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Compute (GCP).
Why become a Data Scientist?
As mentioned in the introduction, a 2012 Harvard Business Review article called Data Science "The Sexiest Job of the 21st Century" and there is still no sign that the need for Data Scientists will abate in the coming years. As more and more data becomes accessible, big tech companies are no longer the only ones in need of Data Scientists. The growing demand for Data Science professionals across all industries, large and small, is being challenged by a shortage of qualified candidates available to fill the open positions.
If you want to learn more about Data Science and start your career in one area of this promising field, check our website to find out more about our
Data Science Bootcamp in which you will learn all the necessary tools and technologies in just 12 weeks.