fitbit手表中文说明书_如何获取和分析Fitbit睡眠分数
fitbit手表中文說明書
Smartwatches and other wearable devices have gained popularity over the past couple of years and have given rise to the cultural phenomenon of the “Quantified Self”. Devices such as the Apple Watch or Fitbit have made it possible for anyone to easily self-track and thereby quantify their lives in some way. Popular self-quantifications include calories burnt, steps walked during the day or quality of sleep.
在過去的幾年中,智能手表和其他可穿戴設(shè)備獲得了普及,并引起了“量化自我”的文化現(xiàn)象。 諸如Apple Watch或Fitbit之類的設(shè)備使任何人都可以輕松進(jìn)行自我跟蹤,從而以某種方式量化他們的生活。 流行的自我量化方法包括燃燒卡路里,白天行走的步數(shù)或睡眠質(zhì)量。
In this article, I will focus on the latter, namely quality of sleep, using real life data from approximately one year of Fitbit usage. Fitbit provides users with a Sleep Score, which is supposed to be a measure of sleep quality. I will train and test different Machine Learning models using Python in an attempt to predict the Fitbit Sleep Score as accurately as possible while providing an explanation of how different metrics, such as minutes of REM sleep, affect the score.
在本文中,我將重點(diǎn)討論后者,即睡眠質(zhì)量,它使用來自Fitbit大約一年使用的真實(shí)生活數(shù)據(jù)。 Fitbit為用戶提供睡眠分?jǐn)?shù),該分?jǐn)?shù)可以衡量睡眠質(zhì)量。 我將使用Python訓(xùn)練和測(cè)試不同的機(jī)器學(xué)習(xí)模型,以嘗試盡可能準(zhǔn)確地預(yù)測(cè)Fitbit睡眠得分,同時(shí)說明不同的指標(biāo)(例如REM睡眠的分鐘數(shù))如何影響得分。
The article is structured as follows:
這篇文章的結(jié)構(gòu)如下:
Because there is a lot to cover, I split the article into three parts. Part 1 covers points 1 through 4 and focuses on getting the sleep data, preprocessing and visualising it. Part 2 covers points 5 through 10, i.e. actually building the Machine Learning models based on the preprocessed data from part 1. Part 3 covers the rest and is all about improving the models from part 2 to get the most accurate predictions possible.
由于涉及的內(nèi)容很多,因此將文章分為三部分。 第1部分涵蓋了第1點(diǎn)到第4點(diǎn),并著重于獲取睡眠數(shù)據(jù),對(duì)其進(jìn)行預(yù)處理和可視化。 第2部分涵蓋了第 5點(diǎn)到第10點(diǎn),即根據(jù)第1 部分中的預(yù)處理數(shù)據(jù)實(shí)際構(gòu)建機(jī)器學(xué)習(xí)模型。 第3部分涵蓋了其余部分,所有內(nèi)容都涉及對(duì)第2部分中的模型進(jìn)行改進(jìn)以獲得盡可能準(zhǔn)確的預(yù)測(cè)。
Fitbit睡眠分?jǐn)?shù)到底是多少? (What exactly is the Fitbit Sleep Score?)
The Fitbit Sleep Score is best described through an example, so here are two screenshots of what the App provides to its users:
最好通過一個(gè)示例來描述Fitbit睡眠得分,因此以下是該應(yīng)用程序向用戶提供的兩個(gè)屏幕截圖:
Sleep statistics provided by FitbitFitbit提供的睡眠統(tǒng)計(jì)In the Fitbit App, users are given a Sleep Score, which is 78 in this case, a graphical representation of the sleep stages across the sleep window, the concrete breakdown of these sleep stages in minutes as well as percent and an estimated oxygen variation.
在Fitbit App中,為用戶提供了睡眠得分,在這種情況下為78,是整個(gè)睡眠窗口的睡眠階段的圖形表示,這些睡眠階段的具體分解(以分鐘為單位)以及百分比和估計(jì)的氧氣變化。
This in and of itself seems fairly straight forward. Fitbit just has some algorithm that they plug the relevant sleep statistics, such as minutes spent in REM sleep, into and it spits out the Sleep Score.
就其本身而言,這似乎很簡單。 Fitbit只是有一些算法可以將相關(guān)的睡眠統(tǒng)計(jì)信息(例如,REM睡眠所花費(fèi)的時(shí)間)插入其中,并吐出睡眠得分。
To anyone with a Fitbit who has ever tried to understand patterns in their Sleep Score it is clear that this is far from straight forward. The below screenshots will make it clear where the confusion is coming from:
對(duì)于任何有Fitbit的人,只要曾經(jīng)嘗試了解其睡眠評(píng)分的模式,就很明顯這遠(yuǎn)非直截了當(dāng)。 下面的屏幕截圖可以清楚地說明混亂的來源:
More sleep statistics provided by FitbitFitbit提供的更多睡眠統(tǒng)計(jì)信息Comparing these sleep statistics to the first ones tells us the following:
將這些睡眠統(tǒng)計(jì)信息與第一個(gè)睡眠統(tǒng)計(jì)信息進(jìn)行比較,可以得出以下結(jié)論:
- Time asleep is more than an hour longer 睡眠時(shí)間超過一個(gè)小時(shí)以上
- Time spent in REM sleep is almost the same 快速眼動(dòng)睡眠所花的時(shí)間幾乎相同
- Time spent in deep sleep is a lot longer 深度睡眠所花費(fèi)的時(shí)間更長
Based on these observations one would expect the second sleep score to be higher than the first one but it is actually the same. What is going on here? What role do the different statistics play in the calculation of the Sleep Score? Is it possible to predict Sleep Scores yourself by only looking at the sleep statistics provided?
基于這些觀察結(jié)果,人們期望第二個(gè)睡眠得分高于第一個(gè)睡眠得分,但實(shí)際上是相同的。 這里發(fā)生了什么? 不同的統(tǒng)計(jì)數(shù)據(jù)在睡眠得分的計(jì)算中起什么作用? 僅查看所提供的睡眠統(tǒng)計(jì)信息,是否可以自己預(yù)測(cè)睡眠分?jǐn)?shù)?
This article answers all those questions and provides a detailed walk-through of a Machine Learning project. I hope you enjoy it!
本文回答了所有這些問題,并提供了機(jī)器學(xué)習(xí)項(xiàng)目的詳細(xì)演練。 我希望你喜歡它!
從Fitbit獲取睡眠數(shù)據(jù) (Getting the sleep data from Fitbit)
Fitbit allows users to export sleep data in CSV files through their online dashboards. This process turned out to involve a bit of manual labor because Fitbit only allows a maximum of 31 days of data to be exported at a time. A few minutes later I had all the data and quickly combined them into one CSV file.
Fitbit允許用戶通過其在線儀表板以CSV文件格式導(dǎo)出睡眠數(shù)據(jù)。 事實(shí)證明,此過程需要一點(diǎn)點(diǎn)人工,因?yàn)镕itbit一次最多只能導(dǎo)出31天的數(shù)據(jù)。 幾分鐘后,我獲得了所有數(shù)據(jù),并Swift將它們合并為一個(gè)CSV文件。
There was one problem. The manually exported CSV files included all of the sleep statistics (Minutes Asleep, Minutes Awake, Minutes REE Sleep, etc.) but did not include the actual sleep score. What the hell?!
有一個(gè)問題。 手動(dòng)導(dǎo)出的CSV文件包括所有睡眠統(tǒng)計(jì)信息(“分鐘睡眠”,“分鐘睡眠”,“分鐘REE睡眠”等),但不包括實(shí)際睡眠分?jǐn)?shù)。 我勒個(gè)去?!
After some digging, I discovered that there was another export option called “Lifetime Export”, which exports all the data Fitbit has collected on you ever since you started wearing their watch. You have to request this export from Fitbit before being able to download it and once approved you can download a zip folder with all sorts of different files. Included in that zip folder is a CSV file with additional sleep statistics, including the Sleep Score.
經(jīng)過一番挖掘之后,我發(fā)現(xiàn)還有另一個(gè)導(dǎo)出選項(xiàng),稱為“終身導(dǎo)出”,可以導(dǎo)出自您開始佩戴Fitbit手表以來Fitbit收集的所有數(shù)據(jù)。 您必須先從Fitbit要求導(dǎo)出此導(dǎo)出,然后才能下載它,一旦獲得批準(zhǔn),您就可以下載包含各種不同文件的zip文件夾。 該zip文件夾中包含一個(gè)CSV文件,其中包含其他睡眠統(tǒng)計(jì)信息,包括睡眠得分。
I saved the CSV file containing the sleep statistics as sleep_stats.csv and the the CSV file containing the Sleep Scores as sleep_score.csv. Let’s move on to Python.
我將包含睡眠統(tǒng)計(jì)信息的CSV文件另存為sleep_stats.csv,將包含睡眠分?jǐn)?shù)的CSV文件另存為sleep_score.csv。 讓我們繼續(xù)使用Python。
數(shù)據(jù)清理和準(zhǔn)備 (Data cleaning and preparation)
This section explains how to get from the CSV file to a DataFrame that is ready to be used in Machine Learning models. In the process, I encountered some common problems that can arise when importing data into Python and I explain how to deal with them in order to end up with a neatly preprocessed data set.
本節(jié)說明如何從CSV文件獲取準(zhǔn)備好在機(jī)器學(xué)習(xí)模型中使用的DataFrame。 在此過程中,我遇到了將數(shù)據(jù)導(dǎo)入Python時(shí)可能會(huì)出現(xiàn)的一些常見問題,并且我解釋了如何處理它們以便最終獲得經(jīng)過整齊的預(yù)處理的數(shù)據(jù)集。
After importing all the relevant libraries (see the full notebook for the libraries) the first step is to import the sleep data from the CSV files into Python using the pd.read_csv() function:
導(dǎo)入所有相關(guān)庫之后(第一步,請(qǐng)參見庫的完整筆記本 ),第一步是使用pd.read_csv()函數(shù)將睡眠數(shù)據(jù)從CSV文件導(dǎo)入到Python中:
I only import the first two columns of sleep_score.csv as they are the ones that contain the date and the actual sleep score, all other relevant data is found in sleep_stats.csv. Let’s have a look at the first five rows in sleep_stats_data:
我只導(dǎo)入sleep_score.csv的前兩列,因?yàn)樗鼈兪前掌诤蛯?shí)際睡眠分?jǐn)?shù)的列,所有其他相關(guān)數(shù)據(jù)都在sleep_stats.csv中找到。 讓我們看一下sleep_stats_data中的前五行:
This is the first common problem: because of the way the CSV file is structured, the column names are in the first row. Here is one way to fix this problem:
這是第一個(gè)常見問題:由于CSV文件的結(jié)構(gòu)方式,列名位于第一行。 這是解決此問題的一種方法:
Using the .info() function we can obtain a high-level summary of the data in the DataFrame, which in our case looks like this:
使用.info()函數(shù),我們可以獲取DataFrame中數(shù)據(jù)的高級(jí)摘要,在我們的示例中如下所示:
Here, we encounter the second common problem: there are NaN values (missing data) in the last three columns. This is indicated by the fact that the above information summary tells us that there are 322 entries (rows) but for the last three rows the non-null count is 287. Let’s have a look at the rows that contain missing data using the following code:
在這里,我們遇到了第二個(gè)常見問題:最后三列中有NaN值(缺少數(shù)據(jù))。 上面的信息摘要告訴我們有322個(gè)條目(行),但是對(duì)于最后三行,非空計(jì)數(shù)為287。這說明了這一點(diǎn),讓我們使用以下代碼查看包含缺失數(shù)據(jù)的行:
If we look at the column Minutes Asleep or Start and End Time it becomes clear that these rows refer to afternoon naps that Fitbit recorded. Naps are too short for Fitbit to be able to reliably measure important sleep statistics and therefore we will drop all these rows from the data set:
如果我們查看“分鐘睡眠時(shí)間”或“開始和結(jié)束時(shí)間”列,則很明顯,這些行是Fitbit記錄的午睡時(shí)間。 午睡太短,以至于Fitbit無法可靠地測(cè)量重要的睡眠統(tǒng)計(jì)信息,因此,我們將從數(shù)據(jù)集中刪除所有這些行:
In the above data summary, we also encounter the third common problem, which is related to the first: all columns are of data type “object” but columns with index 2 to 8 should clearly be numerical, i.e. either of data type “int” or “float”. The reason these columns are of data type “object” is most likely because the column headers were initially placed in the first row, thereby causing the entire column to be classified as “object”. Let’s convert these columns to data type “float”:
在上面的數(shù)據(jù)摘要中,我們還遇到了第三個(gè)常見問題,該問題與第一個(gè)相關(guān):所有列的數(shù)據(jù)類型均為“對(duì)象”,但索引為2到8的列顯然應(yīng)為數(shù)字,即數(shù)據(jù)類型為“ int”或“浮動(dòng)”。 這些列屬于數(shù)據(jù)類型“對(duì)象”的原因很可能是因?yàn)榱袠?biāo)題最初放置在第一行中,從而導(dǎo)致整個(gè)列被歸類為“對(duì)象”。 讓我們將這些列轉(zhuǎn)換為數(shù)據(jù)類型“ float”:
Let’s now have a look at the first few rows and the summary of sleep_score_data:
現(xiàn)在讓我們看一下前幾行和sleep_score_data的摘要:
This DataFrame looks a lot better, the column headers were automatically recognised and there are no missing values.
這個(gè)DataFrame看起來好多了,列標(biāo)題被自動(dòng)識(shí)別并且沒有丟失的值。
For the purpose of further analyses, I would like to combine the two DataFrames into one, meaning I want to merge them. To ensure that the Sleep Scores end up in the row with the corresponding sleep statistics I need a column that is identical in both DataFrames, which will be used as the column to merge on.
為了進(jìn)行進(jìn)一步的分析,我想將兩個(gè)DataFrame合并為一個(gè),這意味著我想將它們合并。 為了確保睡眠分?jǐn)?shù)最終在具有相應(yīng)睡眠統(tǒng)計(jì)信息的行中顯示,我需要在兩個(gè)DataFrame中都相同的列,該列將用作合并的列。
In our case, both DataFrames have a column with some sort of timestamp. The sleep statistics DataFrame has a start and an end time and the Sleep Score DataFrame has a timestamp. Because a sleep score is always provided after awakening, the date that is relevant in the sleep statistics DataFrame is the end time and we can drop the start time. But there is one more issue: the format of the end time in the sleep statistics DataFrame is different from the format of the timestamp in the Sleep Score DataFrame. If we tried to merge the DataFrames on these columns, the rows would not be matched up. My solution was to create a “Date” column in both DataFrames that contains only the date, merge the DataFrames on those columns, drop the redundant columns and drop one row that contained a missing value after the merge. The following code accomplishes this:
在我們的例子中,兩個(gè)DataFrame都有一個(gè)帶有某種時(shí)間戳的列。 睡眠統(tǒng)計(jì)數(shù)據(jù)幀具有開始時(shí)間和結(jié)束時(shí)間,而睡眠分?jǐn)?shù)數(shù)據(jù)幀具有時(shí)間戳。 由于始終在喚醒后提供睡眠得分,因此睡眠統(tǒng)計(jì)數(shù)據(jù)框架中相關(guān)的日期是結(jié)束時(shí)間,我們可以減少開始時(shí)間。 但是還有一個(gè)問題:睡眠統(tǒng)計(jì)數(shù)據(jù)幀中結(jié)束時(shí)間的格式與睡眠分?jǐn)?shù)數(shù)據(jù)幀中時(shí)間戳的格式不同。 如果我們嘗試合并這些列上的DataFrame,則行將不匹配。 我的解決方案是在兩個(gè)僅包含日期的數(shù)據(jù)框中創(chuàng)建一個(gè)“日期”列,合并這些列上的數(shù)據(jù)框,刪除冗余列,并刪除合并后包含缺失值的一行。 以下代碼完成了此任務(wù):
The resulting combined DataFrame looks like this:
生成的組合DataFrame如下所示:
Merged and preprocessed data合并和預(yù)處理的數(shù)據(jù)I dropped all columns related to dates because this is not a time series analysis and we do not need the dates going forward. The number of awakenings are not provided by the Fitbit app and because I want to predict Sleep Scores using only data that is provided in the app I dropped it as well.
我刪除了與日期相關(guān)的所有列,因?yàn)檫@不是時(shí)間序列分析,因此我們不需要將來的日期。 Fitbit應(yīng)用程序不提供喚醒次數(shù),因?yàn)槲抑幌胧褂脩?yīng)用程序中提供的數(shù)據(jù)來預(yù)測(cè)睡眠分?jǐn)?shù),所以我也將其刪除。
With the combined and preprocessed DataFrame we can move on to some Exploratory Data Analysis.
通過組合和預(yù)處理的DataFrame,我們可以進(jìn)行一些探索性數(shù)據(jù)分析。
探索性數(shù)據(jù)分析(EDA) (Exploratory Data Analysis (EDA))
In this section I will use visualisations to provide a better understanding of the underlying data. These initial insights will be the foundation for later analyses.
在本節(jié)中,我將使用可視化效果更好地理解基礎(chǔ)數(shù)據(jù)。 這些初步見解將成為以后分析的基礎(chǔ)。
First let’s have a look at the distribution of the Sleep Scores:
首先,讓我們看一下睡眠得分的分布:
The distribution of sleep scores is skewed to the left, which makes sense because bad night sleeps are more likely to occur than exceptionally good night sleeps due to multiple reasons such as staying out late or having to get up extremely early. In addition, the average sleep score is already relatively high at 82 (out of 100) and therefore it is unlikely (basically impossible) to have many outliers that lie far above the mean.
睡眠分?jǐn)?shù)的分布向左傾斜,這是有道理的,因?yàn)橛捎诙喾N原因(例如,熬夜或必須特別早起床),比正常的夜間睡眠更容易發(fā)生不良的夜間睡眠。 另外,平均睡眠得分已經(jīng)相對(duì)較高,為82(滿分100),因此不可能(基本上不可能)有許多離平均值遠(yuǎn)得多的異常值。
Let’s also have a look at the relationship that each individual feature has with the Sleep Score to get a sense of which features might be important and what their relationships to the Sleep Score are. I have defined a function that takes as inputs a DataFrame that contains the target variable in the last column as well as the number of columns to be contained in the entire plot. The number of columns determines how many subplots there are in each row. Here is the function:
我們還要看看每個(gè)功能與睡眠得分之間的關(guān)??系,以了解哪些功能可能很重要以及它們與睡眠得分之間的關(guān)??系。 我定義了一個(gè)函數(shù),該函數(shù)以一個(gè)DataFrame作為輸入,該DataFrame包含最后一列中的目標(biāo)變量以及整個(gè)繪圖中要包含的列數(shù)。 列數(shù)確定每行中有多少個(gè)子圖。 這是函數(shù):
Calling this function with the sleep_data DataFrame and num_cols=3 as inputs results in the following plots:
使用sleep_data DataFrame和num_cols = 3作為輸入調(diào)用此函數(shù)將導(dǎo)致以下繪圖:
Taken by themselves, Minutes Asleep and Minutes REM Sleep seem to have the strongest positive relationship with Sleep Score. Generally speaking this makes sense because more time asleep should be a positive thing when thinking about sleep quality and therefore Sleep Score. The same is true for more time spent in REM sleep.
單獨(dú)考慮,“ Minutes Asleep”和“ Minutes REM Sleep”似乎與睡眠得分之間的關(guān)??系最強(qiáng)。 一般來說,這是有道理的,因?yàn)樵诳紤]睡眠質(zhì)量并因此考慮睡眠得分時(shí),更多的睡眠時(shí)間應(yīng)該是一件積極的事情。 對(duì)于花在REM睡眠上的更多時(shí)間也是如此。
To complete the picture about the relationships between the different features and Sleep Score let’s have a look at the correlation matrix:
為了完成有關(guān)不同功能與睡眠得分之間關(guān)系的描述,讓我們看一下相關(guān)矩陣:
Indeed, Sleep Score has the highest correlation with Minutes REM Sleep, closely followed by Minutes Asleep. Another important thing to note is that many of the features are highly correlated. This makes sense because more time asleep should lead to more time spent in all stages of sleep and the features will tend to move together. While this may be an inevitable by-product of the nature of the features included here it could lead to multicollinearity issues down the road. More on this later.
實(shí)際上,睡眠分?jǐn)?shù)與分鐘REM睡眠的相關(guān)性最高,緊隨其后的是分鐘睡眠。 還要注意的另一重要事項(xiàng)是,許多功能是高度相關(guān)的。 這是有道理的,因?yàn)楦嗟乃邥r(shí)間會(huì)導(dǎo)致在所有睡眠階段花費(fèi)更多的時(shí)間,并且這些功能部件往往會(huì)一起移動(dòng)。 雖然這可能是此處包含的功能的本質(zhì)的必然產(chǎn)物,但它可能會(huì)導(dǎo)致多重共線性問題。 稍后再詳細(xì)介紹。
Part 2 builds on the preprocessed data and the insights from the Exploratory Data Analysis to build a couple of different Machine Learning models that predict Sleep Scores. Part 2 can be found here:
第2部分基于預(yù)處理數(shù)據(jù)和Exploratory Data Analysis的見解,構(gòu)建了兩個(gè)不同的預(yù)測(cè)睡眠分?jǐn)?shù)的機(jī)器學(xué)習(xí)模型。 第2部分可以在這里找到:
翻譯自: https://towardsdatascience.com/how-to-obtain-and-analyse-fitbit-sleep-scores-a739d7c8df85
fitbit手表中文說明書
總結(jié)
以上是生活随笔為你收集整理的fitbit手表中文说明书_如何获取和分析Fitbit睡眠分数的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: s12冰女大乱斗出装 丽桑卓符文怎么点
- 下一篇: 熔池 沉积_用于3D打印的AI(第2部分