Switch to English Site

The most important data skills (and online data courses) in 2022

The most important data skills (and online data courses) in 2022

2022年3月16日

With so many exciting data science tools and techniques to explore these days, it can be hard to start building the right learning path for you. Thankfully, Matt Dancho of Business Science put together a great list with the most important data science skills for 2022.

A Valuable Data Skills Table

If a picture is worth 1,000 words, then Matt's table is worth 50,000 dollars because he believes that mastering these skills will boost your annual salary by $50k. We've added links to learning resources for many of his topic categories so that you can start to strengthen key areas today.

PlanSkills
Machine LearningSupervised Classification, Supervised Regression, Unsupervised  Clustering, Dimensionality Reduction, Local Interpretable Model Explanation - H2O Automatic Machine Learning, parsnip (XGBoost, SVM, Random Forest, GLM), K-Means, UMAP, recipes, lime
Data VisualizationInteractive and Static Visualizations, ggplot2 and plotly
Data Wrangling & CleaningWorking with outliers, missing data, reshaping data, aggregation, filtering, selecting, calculating, and many more critical operations, dplyr and tidyr packages
Data Preprocessing & Feature EngineeringPreparing data for machine learning, Engineering Features (dates, text, aggregates), Recipes package
Time SeriesWorking with date/datetime data, aggregating, transforming, visualizing time series, timetk package
ForecastingARIMA, Exponential Smoothing, Prophet, Machine Learning (XGBoost, Random Forest, GLMnet, etc), Deep Learning (GluonTS), Ensembles, Hyperparamter Tuning, Scaling to 1000s of forecasts, Modeltime package
TextWorking with text data, Stringr
NLPMachine learning, Text Features
Functional ProgrammingMaking reusable functions, sourcing code
IterationLoops and Mapping, using Purrr package
ReportingRmarkdown, Interactive HTML, Static PDF
ApplicationsBuilding Shiny web applications, Flexdashboard, Bootstrap
DeploymentCloud (AWS, Azure, GCP), Docker, Git
DatabasesSQL (for data import), MongoDB (for apps)

 

Online Data Courses by Skill and Tool

Don’t feel like narrowing down all the available options on your own? No problem. Below are 40 free and highly recommended online data courses broken down by skill category and tool. 

1. Machine Learning

Machine Learning (ML) is a subset of Artificial Intelligence (AI) that empowers us to build complex predictive models that do not require explicit programming for each possible outcome.


2. Data Visualization

Data visualization is the process of turning underlying data into summary graphics. The purpose may be to tease out insights during exploratory analysis or package visuals in a way that effectively informs and influences others. There are many visualization design choices that can be constructed in static or interactive form.


3. Data Wrangling & Cleaning

Data wrangling is the process of changing the structure of a given dataset into a more desirable format. Data cleaning is the process of identifying and correcting potential issues in a dataset that may negatively impact an analysis or process. Typical issues to address include missing data, outliers, and corrupt values.


4. Data Preprocessing & Feature Engineering

Data processing is controlling for issues discovered during data cleaning by creating a set of instructions that will result in a dataset that is ready for further analysis. Feature engineering is the process of selecting from a dataset the relevant variables in their optimal form to be used as inputs in a predictive model.


5. Time Series

Time series analysis is the process of tracking change over time. Data is generally analyzed across equally spaced intervals such as hour, day, week, month, quarter, or year.


6. Forecasting

Forecasting is the ability to generate a model that makes predictions about future outcomes along a time horizon. Forecast models are generally based on historic patterns in the variable of interest or a set of input variables from which there are underlying relationships.


7. Text

Text analysis is the ability to derive meaning from unstructured text data, which is growing exponentially thanks to applications such as social media, blogging, and chat. Processing text data involves adding structure to the data so that we can more easily apply algorithms that digest themes and sentiments more efficiently. Processing text data involves adding structure to the data so that we can more easily apply algorithms that digest themes and sentiments more efficiently.

8. Natural Language Processing (NLP)

Natural Language Processing (NLP) attempts to help computers understand and confidently respond to human language stored as text data or audio samples.


9. Functional Programming & Iteration:

Functional programming, as opposed to object oriented programming, is a declarative coding approach that is centered around variables and functions during program development.


10. Reporting

There is no value from an insight that stays buried in a raw data file or highlighted in someone's exploratory spreadsheet. Reporting is the process of turning key findings into digestible pieces of information in visually engaging ways. Some purposes might be to share business intelligence internally or position data-driven thought leadership externally.


11. Applications

Applications are interactive environments for people to access, engage with, and contribute to data systems.


12. Deployment

If you can't put your analysis or application into production, it isn't going to do many people very good. Deployment is the process of publishing data-driven tools or systems so that they can be easily accessed by intended users and easily maintained by developers.


13. Databases

A database is an organized system for data storage and access. It generally contains many individual data tables that represent specific data sources. These tables can be created, read, updated, and deleted with SQL commands.


Where to go from here?

If you don't see the tool or technique you want to learn in the categories above, head to our updated search page where you can filter nearly two thousand data science courses and learning resources with something for every skill level and career aspiration. 

相关学习路径

None
DataKwery

13 Courses

Free Data Analyst

None
DataKwery
None
DataKwery