
Bellabeat is a high-tech manufacturer of health-focused smart products for women. Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. Urška Sršen, cofounder and Chief Creative Officer of Bellabeat, believes that analyzing smart device fitness data could help unlock new growth opportunities for the company.

This project will focus on one of Bellabeat’s products and analyse smart device data to gain insight into how consumers are using their smart devices. The insights discovered will then help guide marketing strategy for the company.

The company has invested in traditional advertising media, such as radio, out-of-home billboards, print, and television, but focuses on digital marketing extensively.

In general, analyzing smart device fitness data could help unlock new growth opportunities for the company. We can Analyze smart device data to gain insight into how consumers are using their smart devices and the insights discovered can help guide marketing strategy for the company.

Current Bellabeat products - the Bellabeat app, leaf, Time, Spring and Bellabeat membership.

Step 1: Ask

Business Task:

Analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices. Then select one Bellabeat product to apply these insights to and using this information, establish high-level recommendations for how these trends can inform Bellabeat marketing strategy.


Bellabeat executive team. Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer. Sando Mur: Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team.

The Scope of Work (Scope-Of-Work-CaseStudy.pdf) can be viewed in the github repository at:

Step 2: Prepare

Dataset: FitBit Fitness Tracker Data

Storage: Data is located in a public domain – Kaggle (second-party provider). Data will be down loaded and stored in a secure directory


  • Data is contained in 18 csv files. Each file contains different features, e.g., daily activity, calories, heart rate, etc.
  • Data is long format.
  • Data is structured
  • Different datasets will have to be merged to improve analysis and insights


  • Variation between output represents use of different types of Fitbit trackers and individual tracking behaviors / preferences.
  • Each file has a common ID and timestamp.
  • Individual reports can be parsed by export session ID (column A) or timestamp (column B).
  • Each file has 940 observations, a common ID and timestamp.
  • Feature data is Quantitative and continuous.
  • Time Frame - Source states Data ranges from 03.12.2016 to 05.12.2016 (March, April, May) but appears to be 4/12/2016 to 5/12/2016, (April and May) in the actual data
  • Sample size - Number of Unique ID (users) values appear different for some files (example. 28 and 33). Metadata states 30 users.

Licensing, privacy, security, and accessibility:

  • The data is in an open public domain so there are no restrictions on accessibility & no privacy or security risks.
  • License: CC0: Public Domain.
  • Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring.

Data Integrity/Credibility:

  • Dataset generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016.
  • There appears to be no obvious problems with the data.
  • Data is Cited - Data source is Möbius (Owner) - Data Scientist at Healthcare Melbourne, Victoria, Australia.
  • The data does appear comprehensive.
  • The data does appear credible.
  • The data appears to be reliable - the data is mainly quantitative and measured scientifically so bias is less likely.
  • Is the data Current? - Considering the age of the data (2016), is it still representative of current trends? A more up to date sample would be appropriate as fitness trends and approaches may have changed.
  • Is the data unbiased and objective? Is it representative of the general population? Not obvious what the sex of the respondents are - the business task is for female customers and products. It’s not clear if the sample is representative of the target population. Also, the data has been pre-processed.
  • Data is recorded over a two-month period - this is a short period and may be impacted by factors like season and holidays, example, fitness habits may change from summer months to winter habits, etc.

Step 3: Process

Full report on processing the datasets (and final transformations) can be found here:

Note data processing was carried out using python and Jupyter Notebook.

Initial data files are located in the “data” folder. Processed/transformed datasets are located in the “clean data” folder.

Steps 4/5: Analyse & Share

A comprehensive analysis of the data was carried using two tools, Jupyter Notebook / Python, and R / R Studio. The Analysis includes observations and opportunities for improvement.

For the comprehensive analysis of the smart device fitness data, checkout the reports here:

For Python analysis in Jupyter Notebook:

For R analysis using R Studio:

Step 6: Act

Click here for Final Project Conclusions & Recommendations: