當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

invalid floating point operation什么意思_数据可视化有意思的小例子：Taylor Swift 歌词数据分析和可视化...

發(fā)布時(shí)間：2024/9/15 编程问答 28 豆豆

生活随笔收集整理的這篇文章主要介紹了 invalid floating point operation什么意思_数据可视化有意思的小例子：Taylor Swift 歌词数据分析和可视化... 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

原文地址

Data Visualization and Analysis of Taylor Swift’s Song Lyrics

英語學(xué)習(xí)時(shí)間

Taylor Swift

- She is the youngest person to single-handedly write and perform a number-one song on the Hot Country Songs chart published by Billboard magazine in the United States.

- Apart from that she is also the recipient of 10 Grammys, one Emmy Award, 23 Billboard Music Awards, and 10 Country Music Association Awards.

- song lyrics 歌詞

數(shù)據(jù)集

Taylor Swift 的 6 張專輯（album）96首歌的歌詞

6列數(shù)據(jù)

- 歌手名 artist

- 專輯名 album name

- 歌名 track title

- 專輯中第幾首歌 track number

- 歌詞（每句一行）lyric

- 歌詞是這首歌的第幾句 line number

- 發(fā)表年份 year of release of the album

主要的分析內(nèi)容

探索性數(shù)據(jù)分析 - 每首歌和每張專輯的歌詞的單詞數(shù)量 - 單詞數(shù)量隨著年份的變化 - 單詞數(shù)量的頻率分布

文本挖掘

詞云
bigram network (暫時(shí)還不太明白這個(gè)是什么意思)
情感分析（sentiment analysis）

使用的工具是R語言

探索性數(shù)據(jù)分析

接觸到一個(gè)新的函數(shù)：stringr包中的str_count() 幫助文檔中的例子

library(stringr) fruit <- c("apple", "banana", "pear", "pineapple") str_count(fruit, "a") #輸出結(jié)果是 [1] 1 3 1 1

作用是統(tǒng)計(jì)每個(gè)字符串中符合特定規(guī)則的字符的數(shù)量比如

str_count("A B C","S+")

輸出的是“A B C”字符串中非空字符的數(shù)量（S+是正則表達(dá)式的一種寫法，自己還沒有掌握）讀入數(shù)據(jù)

lyrics<-read.csv("taylor_swift_lyrics_1.csv",header=T) head(lyrics)

計(jì)算每句歌詞的長度

library(stringr) lyrics$length<-str_count(lyrics$lyric,"S+") head(lyrics)

計(jì)算每首歌的歌詞長度

library(dplyr) length_df<-lyrics%>%group_by(track_title)%>%summarise(length=sum(length)) head(length_df) dim(length_df)

第一項(xiàng)內(nèi)容：單詞數(shù)量最多的10首歌

Top10wordCount<-arrange(length_df,desc(length))%>%slice(c(1:10)) library(ggplot2) ggplot(Top10wordCount,aes(x=reorder(track_title,length),y=length))+geom_col(aes(fill=track_title))+coord_flip()+ylab("Word count") + xlab ("") + ggtitle("Top 10 songs in terms of word count") + theme_minimal()+theme(legend.position = "none")

從上圖可以看到，單詞數(shù)量最多的歌是 End Game 排名第二的是 Out of the Woods

第二項(xiàng)內(nèi)容：單詞數(shù)最少的10首歌

Top10wordCount<-arrange(length_df,length)%>%slice(c(1:10)) library(RColorBrewer) color<-rainbow(10) ggplot(Top10wordCount,aes(x=reorder(track_title,-length),y=length))+geom_col(aes(fill=track_title))+coord_flip()+ylab("Word count") + xlab ("") + ggtitle("Top 10 songs in terms of word count") + theme_minimal()+scale_fill_manual(values = color)+theme(legend.position = "none")+theme(legend.position = "none")

單詞數(shù)量最少的歌是 Sad Beautiful Tragic，發(fā)布于2012年，是 Red 這張專輯中的歌

第三項(xiàng)內(nèi)容：單詞數(shù)量的頻率分布

ggplot(length_df, aes(x=length)) + geom_histogram(bins=30,aes(fill = ..count..)) + geom_vline(aes(xintercept=mean(length)),color="#FFFFFF", linetype="dashed", size=1) +geom_density(aes(y=25 * ..count..),alpha=.2, fill="#1CCCC6") +ylab("Count") + xlab ("Legth") + ggtitle("Distribution of word count") + theme_minimal()

第四項(xiàng)內(nèi)容：每張專輯的單詞數(shù)量

lyrics %>% group_by(album,year) %>% summarise(length = sum(length))%>%na.omit()-> length_df_album length_df_album ggplot(length_df_album, aes(x= reorder(album,-length), y=length)) +geom_bar(stat='identity', fill="#1CCCC6") + ylab("Word count") + xlab ("Album") + ggtitle("Word count based on albums") + theme_minimal()

第五項(xiàng)內(nèi)容：每張專輯單詞數(shù)量隨時(shí)間的變化趨勢

length_df_album %>% arrange(desc(year)) %>% ggplot(., aes(x= factor(year), y=length, group = 1)) +geom_line(colour="#1CCCC6", size=1) + ylab("Word count") + xlab ("Year") + ggtitle("Word count change over the years") + theme_minimal()+geom_point(aes(x=factor(year),y=length,size=length,color=factor(year)),alpha=0.5)+scale_size_continuous(range=c(5,15))+theme(legend.position = "none")

第五項(xiàng)內(nèi)容：詞云圖

library("tm") library("wordcloud") lyrics_text <- lyrics$lyric lyrics_text<- gsub('[[:punct:]]+', '', lyrics_text) lyrics_text<- gsub("([[:alpha:]])1+", "", lyrics_text) docs <- Corpus(VectorSource(lyrics_text)) docs <- tm_map(docs, content_transformer(tolower)) docs <- tm_map(docs, removeWords, stopwords("english")) tdm <- TermDocumentMatrix(docs) m <- as.matrix(tdm) word_freqs = sort(rowSums(m), decreasing=TRUE) lyrics_wc_df <- data.frame(word=names(word_freqs), freq=word_freqs) lyrics_wc_df <- lyrics_wc_df[1:300,] set.seed(1234) wordcloud(words = lyrics_wc_df$word, freq = lyrics_wc_df$freq,min.freq = 1,scale=c(1.8,.5),max.words=200, random.order=FALSE, rot.per=0.15,colors=brewer.pal(8, "Dark2"))

情感分析

剩下的部分有時(shí)間回來補(bǔ)上

歡迎大家關(guān)注我的公眾號

小明的數(shù)據(jù)分析筆記本

文章中用到的數(shù)據(jù)大家可以自己在原文鏈接下載，也可以在我的公眾號留言

總結(jié)

以上是生活随笔為你收集整理的invalid floating point operation什么意思_数据可视化有意思的小例子：Taylor Swift 歌词数据分析和可视化...的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：全选文字的快捷键_高效办公必备Excel
下一篇： python创造订单_Odoo 10根据