Browse Definitions :
Top 25 Data Science Certification Courses for 2021 8 top data science applications and use cases for businesses

8 data science projects to build your resume

A strong data science resume includes a variety of projects. Find out which data science project types employers are looking for and how to present them on your resume.

Writing a specific resume to apply for a data science position is no easy task. However, it is necessary, as applicants need to submit resumes for any open data science position. A well-written resume is the most critical component of getting an interview for a job as a data scientist.

A good data science resume should be brief -- typically, just one page long, unless the applicant has many years of experience. The sections of the data science resume should include:

  • Resume objective
  • Experience
  • Education
  • Certifications
  • Skills
  • Projects
  • Publications

These sections help applicants demonstrate their backgrounds and knowledge in relevant areas.

Organizations looking to hire data scientists expect candidates to have either some previous work experience or, alternatively, data science-related projects. Job seekers transitioning to careers in data science right from college, switching careers or seeking different types of data science jobs can use projects to show prospective employers they have the necessary skills to do the work. A data science project portfolio should include three to five projects that showcase the applicant's relevant skills.

Here are eight data science projects to build your resume.

Sentiment analysis

Today, data-driven companies use sentiment analysis to identify customers' attitudes about their products or services. Sentiment analysis is the automated process of determining if opinions toward a product or service are positive, negative or neutral. Normally, this is expressed in pieces of text.

The objective of sentiment analysis is to help a company figure out the answers to questions such as:

  • Why don't customers like the product or service?
  • Why isn't the product or service hitting its target sales goals?
  • How can the product or service be changed so more customers like it?
  • What factors affect customer sentiment toward the product or service, e.g., quality, quantity, price or something else?

Customer opinions can range from positive to negative, and the range of responses can be classed as positive, negative or multiple -- i.e., excited, angry, happy, sad or another emotion.

This sentiment analysis data science project could be implemented in the R language, using the "janeaustenR" package or data set. For this project, the job candidate will use general-purpose lexicons, including:

  • Loughran, which is used for financial text.
    • Bing, which labels words as positive or negative.
    • AFINN, a list of words rated for valence characterizing and categorizing specific emotions.
  • An integer between minus five and plus five.

The applicant can then build a word cloud to display the results.

Real-time face detection

Face detection, a method to distinguish a person's face from other parts of the body and the background, is a simpler undertaking and can be considered a beginner-level project.

The objective of face detection is to determine if there are any faces in an image or video. If there is more than one face in the image or video, each face is enclosed by a bounding box. A job applicant should be able to build a simple face detector using Python. Building a program that detects faces is a great way to get started with computer vision.

The module library used for this project is called the Open Source Computer Vision Library (OpenCV), an open source computer vision and machine learning library with a focus on real-time applications.

Face detection is one of the steps needed for facial recognition, the procedural recognition of a person's face along with the user's authorized name. The best method for facial recognition is to use deep neural networks.

After a face is detected, deep learning can solve face recognition tasks, using such transfer learning models as VGG16 architecture, ResNet50 architecture and FaceNet architecture. These make it easier to build deep learning models, enabling users to build high-quality face recognition systems. Users can also build their own deep learning models to build face recognition systems. Face recognition models can be used in security systems and surveillance, for example.

Spam detection

Spam detection is a classic data science problem, as organizations need to monitor their communication channels for spam emails and messages to ward off data security threats. Google, Yahoo and other major email providers implement spam detection algorithms to handle the threats posed by spam emails.

Training a model to detect spam messages and spam emails is another project for data science applicants to use to build their resumes.

Project: Spam classification
Tools: Scikit-learn, Spacy, NLTK, Python
Data set: SMS Spam Collection Dataset from Kaggle

Data storytelling and visualization

Using data to provide insights, tell stories and convince people of something is an important part of a data science job. What good is doing a top-notch analysis if the CEO doesn't understand it or take action based on it?

This data science project should enable laypeople, such as hiring managers with little coding or statistical backgrounds, to draw the appropriate conclusions. Data visualization and communication skills are important for this project to show and explain the applicant's code.

One example is doing a data visualization project using ggplot2 (a data visualization package for the statistical programming language R) and its libraries to analyze certain parameters, such as the number of trips a Boston Uber driver makes in one day, one month, three months, six months or 12 months. The applicant will use Uber pickups in the Boston data set, for instance, and create visualizations for the different time frames of the year. This reveals how time affects customer trips.

Project: Uber data analysis project in R
Language: R
Data set: Uber pickups in Boston

Recommender system

A recommender system, a platform that uses a filtering process, offers users various content based on their preferences. A recommender system inputs information about the user, evaluates those parameters using a machine learning model and returns recommendations -- for example, with movie recommendations.

A movie recommendation can be based on input received from people who have seen a particular film. Their responses can classify a movie as funny, boring, interesting, exciting or even a waste of time.

There are two types of recommender systems:

  • Content-based system. This offers recommendations based on the data a user provides. The system generates a user profile based on that data, which it then uses to make suggestions to the user. As the user inputs more data or takes certain actions based on the recommendations, the recommendation engine becomes increasingly more accurate. The recorded activity allows an algorithm to offer suggestions on movies if they're similar to those the user liked in the past.
  • Collaborative system. This offers recommendations based on information about other users with similar viewing histories or preferences. Recording users' preferences enables a collaborative system to cluster similar users and provide recommendations based on the activities of users in the same group.

Netflix, for example, recommends movies or shows that are similar to a user's browsing history or movies that other users with similar browsing histories have watched in the past.

Project: Movie recommendation system project in R
Language: R
Data set: MovieLens dataset

Optical character recognition

This data science project is great for beginners. Optical character recognition (OCR) uses an electronic or mechanical device to convert two-dimensional text data into a form of machine-encoded text. Computer vision can be used to read the text files or image. After reading the image, use the Python-pytesseract module (an OCR tool for Python) to read the text data in the PDF or image. Then convert the text data into a string of data that can be displayed in Python.

Once data science job applicants thoroughly understand how OCR works and the necessary tools, they can compute more complex problems, such as using sequence-to-sequence attention models to convert the data the OCR reads from one language into another.

Time series prediction

Time series prediction is the study of how metrics behave over time. The time series technique is commonly used in data science with a wide range of applications, including weather forecasting, predicting sales, analyzing annual trends and analyzing website traffic.

The increase in traffic to a website can be a major problem for a company, as it can cause the site to load slowly or crash entirely. Predicting the website traffic can enable the company to make better decisions to control the congestion.

Project: Web traffic time series forecasting
Tools: Google Cloud Platform
Algorithms: Recurrent neural networks, long- and short-term memory, autoregressive integrated moving average-based techniques
Data set: The data set consists of 145,000 time series, representing the number of daily page views of different Wikipedia articles.

Data sources

One of the key decisions data science job applicants have to make is what data to analyze with any project.

Here are some websites where applicants can find data to work with.

  • Kaggle. The world's largest data science community that offers tools and resources to help users achieve their data science goals. Includes different types of data sets of varying sizes that users can download for free.
  • Data Portals. A comprehensive list of 590 (to date) open data portals from around the globe, each of which offers its own library of data sets. The data portal is curated by a group of open data experts, including representatives from local, regional and national governments and international organizations, such as the World Bank, and many nongovernmental organizations.
  • Data.gov. The home of the U.S. government's open data, which includes data, tools and resources for conducting research, developing web and mobile applications, and designing data visualizations.
  • Open Data on AWS. The Registry of Open Data on AWS makes it easy to find data sets publicly available through Amazon services.
  • Academic Torrents. A distributed system for sharing massive data sets. The site facilitates the storage of all the data used in research, including data sets and publications.

How to add data science projects to a resume

The best projects to showcase are ones that can be presented succinctly. A well-constructed description of the project can be presented in a few sentences to a paragraph.

When adding data science projects to a resume, applicants should include:

  • The name of the project.
  • A description of the role -- was this a personal effort or a team effort?
  • A brief explanation of the purpose of the project.
  • A couple sentences about how the project was built.
  • The tools that were used.
  • What the project accomplished.
  • A sentence about how the same principle could apply in business.
  • A link to the project -- a website that offers data science job applications the opportunity to showcase all their personal projects in depth.
  • A link to the code.

Although many recruiters and hiring managers will follow links and look at candidates' project presentations on their websites or portfolio sites, some will only look at a candidate's GitHub.

As such, applicants should know the basics of GitHub and be familiar with Git -- a version control system they can use to manage and keep track of their source code histories.

Data scientists are in high demand. Consequently, there's enormous potential for growth in this field for skilled professionals. To break into the field of data science, job applicants must impress prospective employers by showcasing their skills and expertise. They can demonstrate they have the necessary skills by adding data science projects to their resumes.

Dig Deeper on Data and data management

Networking
  • subnet (subnetwork)

    A subnet, or subnetwork, is a segmented piece of a larger network. More specifically, subnets are a logical partition of an IP ...

  • Transmission Control Protocol (TCP)

    Transmission Control Protocol (TCP) is a standard protocol on the internet that ensures the reliable transmission of data between...

  • secure access service edge (SASE)

    Secure access service edge (SASE), pronounced sassy, is a cloud architecture model that bundles together network and cloud-native...

Security
  • cyber attack

    A cyber attack is any malicious attempt to gain unauthorized access to a computer, computing system or computer network with the ...

  • digital signature

    A digital signature is a mathematical technique used to validate the authenticity and integrity of a digital document, message or...

  • What is security information and event management (SIEM)?

    Security information and event management (SIEM) is an approach to security management that combines security information ...

CIO
  • product development (new product development)

    Product development -- also called new product management -- is a series of steps that includes the conceptualization, design, ...

  • innovation culture

    Innovation culture is the work environment that leaders cultivate to nurture unorthodox thinking and its application.

  • technology addiction

    Technology addiction is an impulse control disorder that involves the obsessive use of mobile devices, the internet or video ...

HRSoftware
  • organizational network analysis (ONA)

    Organizational network analysis (ONA) is a quantitative method for modeling and analyzing how communications, information, ...

  • HireVue

    HireVue is an enterprise video interviewing technology provider of a platform that lets recruiters and hiring managers screen ...

  • Human Resource Certification Institute (HRCI)

    Human Resource Certification Institute (HRCI) is a U.S.-based credentialing organization offering certifications to HR ...

Customer Experience
  • contact center agent (call center agent)

    A contact center agent is a person who handles incoming or outgoing customer communications for an organization.

  • contact center management

    Contact center management is the process of overseeing contact center operations with the goal of providing an outstanding ...

  • digital marketing

    Digital marketing is the promotion and marketing of goods and services to consumers through digital channels and electronic ...

Close