generate random or regular test data in R
生活随笔
收集整理的這篇文章主要介紹了
generate random or regular test data in R
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
如何在R中產(chǎn)生一些規(guī)則或不規(guī)則的測(cè)試數(shù)據(jù)? 產(chǎn)生連續(xù)分布的向量
例子 :? 1. 使用冒號(hào). 產(chǎn)生連續(xù)整數(shù)向量
> 1:10[1] 1 2 3 4 5 6 7 8 9 10 > 1:10-2 # 注意:號(hào)優(yōu)先級(jí)高于減號(hào)運(yùn)算符[1] -1 0 1 2 3 4 5 6 7 8 > 1:(10-2) [1] 1 2 3 4 5 6 7 8 > a <- 1:10 > a[1] 1 2 3 4 5 6 7 8 9 10
2. 使用seq函數(shù), 產(chǎn)生連續(xù)值, 可以指定步長(zhǎng).
> seq(1,10)[1] 1 2 3 4 5 6 7 8 9 10 > seq(1,10,0.5) # from=1 to=10 by=0.5[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 [16] 8.5 9.0 9.5 10.0 > seq(1,10,1)[1] 1 2 3 4 5 6 7 8 9 10 > seq(from=1,to=10,by=1) # by 指定步長(zhǎng)[1] 1 2 3 4 5 6 7 8 9 10 > seq(from=1,to=10,length.out=1) [1] 1 > seq(from=1,to=10,length.out=100)[1] 1.000000 1.090909 1.181818 1.272727 1.363636 1.454545 1.545455[8] 1.636364 1.727273 1.818182 1.909091 2.000000 2.090909 2.181818[15] 2.272727 2.363636 2.454545 2.545455 2.636364 2.727273 2.818182[22] 2.909091 3.000000 3.090909 3.181818 3.272727 3.363636 3.454545[29] 3.545455 3.636364 3.727273 3.818182 3.909091 4.000000 4.090909[36] 4.181818 4.272727 4.363636 4.454545 4.545455 4.636364 4.727273[43] 4.818182 4.909091 5.000000 5.090909 5.181818 5.272727 5.363636[50] 5.454545 5.545455 5.636364 5.727273 5.818182 5.909091 6.000000[57] 6.090909 6.181818 6.272727 6.363636 6.454545 6.545455 6.636364[64] 6.727273 6.818182 6.909091 7.000000 7.090909 7.181818 7.272727[71] 7.363636 7.454545 7.545455 7.636364 7.727273 7.818182 7.909091[78] 8.000000 8.090909 8.181818 8.272727 8.363636 8.454545 8.545455[85] 8.636364 8.727273 8.818182 8.909091 9.000000 9.090909 9.181818[92] 9.272727 9.363636 9.454545 9.545455 9.636364 9.727273 9.818182[99] 9.909091 10.000000
3. 使用scan讓用戶輸入
> scan() 1: 1 2: 2 3: 3 4: 4 5: 100 6: Read 5 items [1] 1 2 3 4 100 > a <- scan() 1: 1 2: 10 3: 100 4: 1000 5: Read 4 items > a [1] 1 10 100 1000
4. 使用rep重復(fù)一個(gè)向量值數(shù)次, 注意each和times參數(shù)的差別.
> a [1] 1 10 100 1000 > rep(a,each=2) 每個(gè)元素重復(fù)2次 [1] 1 1 10 10 100 100 1000 1000 > rep(a,times=2) 每個(gè)向量重復(fù)2次 [1] 1 10 100 1000 1 10 100 1000 > rep(a,each=4,length=10) # length限制返回向量的長(zhǎng)度[1] 1 1 1 1 10 10 10 10 100 100 > rep(a,times=4,length=10)[1] 1 10 100 1000 1 10 100 1000 1 10
5. 使用sequence函數(shù)產(chǎn)生一系列連續(xù)整數(shù)序列.
> sequence(c(2,3,4,5)) # 產(chǎn)生從1到2, 從1到3, 從1到4, 從1到5的序列.[1] 1 2 1 2 3 1 2 3 4 1 2 3 4 5 > sequence(2:5) # 產(chǎn)生從1到2, 從1到3, 從1到4, 從1到5的序列.[1] 1 2 1 2 3 1 2 3 4 1 2 3 4 5 > sequence(5) 從1到5的序列. [1] 1 2 3 4 5 > sequence(c(2,5)) # 產(chǎn)生從1到2, 從1到5的序列. [1] 1 2 1 2 3 4 5
6. 使用gl 產(chǎn)生因子
> gl(n=2, k=3, length=10) # n是level數(shù)量, k是每個(gè)level的重復(fù)次數(shù), length是總長(zhǎng)度[1] 1 1 1 2 2 2 1 1 1 2 Levels: 1 2 > gl(n=2, k=3) [1] 1 1 1 2 2 2 Levels: 1 2 > gl(n=2, k=3, labels=c("a", "b")) # labels代替數(shù)字level [1] a a a b b b Levels: a b > gl(n=2, k=3, labels=c("a", "b", "c")) # labels代替數(shù)字level, 如果n<length(lables), 不需要的level不會(huì)出現(xiàn)在上面. [1] a a a b b b Levels: a b c > gl(n=2, k=9, labels=c("a", "b", "c"))[1] a a a a a a a a a b b b b b b b b b Levels: a b c > gl(n=2, k=9, labels=c("a", "b", "c"), ordered=TRUE) # 是否排序[1] a a a a a a a a a b b b b b b b b b Levels: a < b < c
7.?expand.grid()創(chuàng)建數(shù)據(jù)框(data.frame) 數(shù)據(jù)框是列長(zhǎng)度相同的多列結(jié)構(gòu) , 每列的類型可以不一致. 3列如下, 完全匹配, (笛卡爾) 以下一共產(chǎn)生2*2*2行的數(shù)據(jù)框
> expand.grid(h=c(60,80), w=c(100, 300), sex=c("Male", "Female"))h w sex 1 60 100 Male 2 80 100 Male 3 60 300 Male 4 80 300 Male 5 60 100 Female 6 80 100 Female 7 60 300 Female 8 80 300 Female
以下一共產(chǎn)生2*2*3行的數(shù)據(jù)框
> expand.grid(h=c(60,80), w=c(100, 300), sex=c("Male", "Female", "non"))h w sex 1 60 100 Male 2 80 100 Male 3 60 300 Male 4 80 300 Male 5 60 100 Female 6 80 100 Female 7 60 300 Female 8 80 300 Female 9 60 100 non 10 80 100 non 11 60 300 non 12 80 300 non
產(chǎn)生規(guī)則分布的測(cè)試數(shù)據(jù) :? 在統(tǒng)計(jì)學(xué)中,產(chǎn)生隨機(jī)數(shù)據(jù)是很有用的,R可以產(chǎn)生多種不同分布下的隨機(jī)數(shù)序列。 這些分布函數(shù)的形式為rfunc(n,p1,p2,...),其中func指概率分布函數(shù),n為生成數(shù)據(jù)的個(gè)數(shù),p1, p2, . . . 是分布的參數(shù)數(shù)值。 上面的表給出了每個(gè)分布的詳情和可能的缺省值(如果沒有給出缺省值,則意味著用戶必須指定參數(shù))。 大多數(shù)這種統(tǒng)計(jì)函數(shù)都有相似的形式,只需用d、p或者q去替代r ?(見下表),比如 :?
1. 分布函數(shù)的形式為 rfunc(n,p1,p2,...) 2. 密度函數(shù) ( dfunc (x, ...) , 3. 累計(jì)概率密度函數(shù)(也即分布函數(shù))( pfunc (x,...) ) , 4. 分位數(shù)函數(shù)( qfunc (p, ...) , 0 < p < 1) .
最后兩個(gè)函數(shù)序列可以用來求統(tǒng)計(jì)假設(shè)檢驗(yàn)中P值或臨界值。 例如,顯著性水平為5%的正態(tài)分布的雙側(cè)臨界值是 :?
> qnorm(0.025) [1] -1.959964 > qnorm(0.975) [1] 1.959964
對(duì)于同一個(gè)檢驗(yàn)的單側(cè)臨界值,根據(jù)備擇假設(shè)的形式使用qnorm(0.05)或1 -qnorm(0.95) 一個(gè)檢驗(yàn)的P 值,比如自由度df = 1的?2= 3:84 :?
> 1 - pchisq(3.84, 1) [1] 0.05004352
分布名稱 ? ? ? ? ? ? ? ? ? ? ? ? ? 函數(shù)?
Gaussian (normal) rnorm(n, mean=0, sd=1) exponential rexp(n, rate=1) gamma rgamma(n, shape, scale=1) Poisson rpois(n, lambda) Weibull rweibull(n, shape, scale=1) Cauchy rcauchy(n, location=0, scale=1) beta rbeta(n, shape1, shape2) `Student' (t) rt(n, df) Fisher{Snedecor (F ) rf(n, df1, df2) Pearson (?2) rchisq(n, df) binomial rbinom(n, size, prob) multinomial rmultinom(n, size, prob) geometric rgeom(n, prob) hypergeometric rhyper(nn, m, n, k) logistic rlogis(n, location=0, scale=1) lognormal rlnorm(n, meanlog=0, sdlog=1) negative binomial rnbinom(n, size, prob) uniform runif(n, min=0, max=1) Wilcoxon's statistics rwilcox(nn, m, n), rsignrank(nn, n)
[參考]1. help("seq")
Description:Generate regular sequences. ‘seq’ is a standard generic with adefault method. ‘seq.int’ is a primitive which can be much fasterbut has a few restrictions. ‘seq_along’ and ‘seq_len’ are veryfast primitives for two common cases.Usage:seq(...)## Default S3 method:seq(from = 1, to = 1, by = ((to - from)/(length.out - 1)),length.out = NULL, along.with = NULL, ...)seq.int(from, to, by, length.out, along.with, ...)seq_along(along.with)seq_len(length.out)Arguments:...: arguments passed to or from methods.from, to: the starting and (maximal) end values of the sequence. Oflength ‘1’ unless just ‘from’ is supplied as an unnamedargument.by: number: increment of the sequence.length.out: desired length of the sequence. A non-negative number,which for ‘seq’ and ‘seq.int’ will be rounded up iffractional.along.with: take the length from the length of this argument. ....
2. help('gl')
Description:Generate factors by specifying the pattern of their levels.Usage:gl(n, k, length = n*k, labels = seq_len(n), ordered = FALSE)Arguments:n: an integer giving the number of levels.k: an integer giving the number of replications.length: an integer giving the length of the result.labels: an optional vector of labels for the resulting factor levels.ordered: a logical indicating whether the result should be ordered ornot.Value:The result has levels from ‘1’ to ‘n’ with each value replicatedin groups of length ‘k’ out to a total length of ‘length’.‘gl’ is modelled on the _GLIM_ function of the same name.
例子 :? 1. 使用冒號(hào). 產(chǎn)生連續(xù)整數(shù)向量
> 1:10[1] 1 2 3 4 5 6 7 8 9 10 > 1:10-2 # 注意:號(hào)優(yōu)先級(jí)高于減號(hào)運(yùn)算符[1] -1 0 1 2 3 4 5 6 7 8 > 1:(10-2) [1] 1 2 3 4 5 6 7 8 > a <- 1:10 > a[1] 1 2 3 4 5 6 7 8 9 10
2. 使用seq函數(shù), 產(chǎn)生連續(xù)值, 可以指定步長(zhǎng).
> seq(1,10)[1] 1 2 3 4 5 6 7 8 9 10 > seq(1,10,0.5) # from=1 to=10 by=0.5[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 [16] 8.5 9.0 9.5 10.0 > seq(1,10,1)[1] 1 2 3 4 5 6 7 8 9 10 > seq(from=1,to=10,by=1) # by 指定步長(zhǎng)[1] 1 2 3 4 5 6 7 8 9 10 > seq(from=1,to=10,length.out=1) [1] 1 > seq(from=1,to=10,length.out=100)[1] 1.000000 1.090909 1.181818 1.272727 1.363636 1.454545 1.545455[8] 1.636364 1.727273 1.818182 1.909091 2.000000 2.090909 2.181818[15] 2.272727 2.363636 2.454545 2.545455 2.636364 2.727273 2.818182[22] 2.909091 3.000000 3.090909 3.181818 3.272727 3.363636 3.454545[29] 3.545455 3.636364 3.727273 3.818182 3.909091 4.000000 4.090909[36] 4.181818 4.272727 4.363636 4.454545 4.545455 4.636364 4.727273[43] 4.818182 4.909091 5.000000 5.090909 5.181818 5.272727 5.363636[50] 5.454545 5.545455 5.636364 5.727273 5.818182 5.909091 6.000000[57] 6.090909 6.181818 6.272727 6.363636 6.454545 6.545455 6.636364[64] 6.727273 6.818182 6.909091 7.000000 7.090909 7.181818 7.272727[71] 7.363636 7.454545 7.545455 7.636364 7.727273 7.818182 7.909091[78] 8.000000 8.090909 8.181818 8.272727 8.363636 8.454545 8.545455[85] 8.636364 8.727273 8.818182 8.909091 9.000000 9.090909 9.181818[92] 9.272727 9.363636 9.454545 9.545455 9.636364 9.727273 9.818182[99] 9.909091 10.000000
3. 使用scan讓用戶輸入
> scan() 1: 1 2: 2 3: 3 4: 4 5: 100 6: Read 5 items [1] 1 2 3 4 100 > a <- scan() 1: 1 2: 10 3: 100 4: 1000 5: Read 4 items > a [1] 1 10 100 1000
4. 使用rep重復(fù)一個(gè)向量值數(shù)次, 注意each和times參數(shù)的差別.
> a [1] 1 10 100 1000 > rep(a,each=2) 每個(gè)元素重復(fù)2次 [1] 1 1 10 10 100 100 1000 1000 > rep(a,times=2) 每個(gè)向量重復(fù)2次 [1] 1 10 100 1000 1 10 100 1000 > rep(a,each=4,length=10) # length限制返回向量的長(zhǎng)度[1] 1 1 1 1 10 10 10 10 100 100 > rep(a,times=4,length=10)[1] 1 10 100 1000 1 10 100 1000 1 10
5. 使用sequence函數(shù)產(chǎn)生一系列連續(xù)整數(shù)序列.
> sequence(c(2,3,4,5)) # 產(chǎn)生從1到2, 從1到3, 從1到4, 從1到5的序列.[1] 1 2 1 2 3 1 2 3 4 1 2 3 4 5 > sequence(2:5) # 產(chǎn)生從1到2, 從1到3, 從1到4, 從1到5的序列.[1] 1 2 1 2 3 1 2 3 4 1 2 3 4 5 > sequence(5) 從1到5的序列. [1] 1 2 3 4 5 > sequence(c(2,5)) # 產(chǎn)生從1到2, 從1到5的序列. [1] 1 2 1 2 3 4 5
6. 使用gl 產(chǎn)生因子
> gl(n=2, k=3, length=10) # n是level數(shù)量, k是每個(gè)level的重復(fù)次數(shù), length是總長(zhǎng)度[1] 1 1 1 2 2 2 1 1 1 2 Levels: 1 2 > gl(n=2, k=3) [1] 1 1 1 2 2 2 Levels: 1 2 > gl(n=2, k=3, labels=c("a", "b")) # labels代替數(shù)字level [1] a a a b b b Levels: a b > gl(n=2, k=3, labels=c("a", "b", "c")) # labels代替數(shù)字level, 如果n<length(lables), 不需要的level不會(huì)出現(xiàn)在上面. [1] a a a b b b Levels: a b c > gl(n=2, k=9, labels=c("a", "b", "c"))[1] a a a a a a a a a b b b b b b b b b Levels: a b c > gl(n=2, k=9, labels=c("a", "b", "c"), ordered=TRUE) # 是否排序[1] a a a a a a a a a b b b b b b b b b Levels: a < b < c
7.?expand.grid()創(chuàng)建數(shù)據(jù)框(data.frame) 數(shù)據(jù)框是列長(zhǎng)度相同的多列結(jié)構(gòu) , 每列的類型可以不一致. 3列如下, 完全匹配, (笛卡爾) 以下一共產(chǎn)生2*2*2行的數(shù)據(jù)框
> expand.grid(h=c(60,80), w=c(100, 300), sex=c("Male", "Female"))h w sex 1 60 100 Male 2 80 100 Male 3 60 300 Male 4 80 300 Male 5 60 100 Female 6 80 100 Female 7 60 300 Female 8 80 300 Female
以下一共產(chǎn)生2*2*3行的數(shù)據(jù)框
> expand.grid(h=c(60,80), w=c(100, 300), sex=c("Male", "Female", "non"))h w sex 1 60 100 Male 2 80 100 Male 3 60 300 Male 4 80 300 Male 5 60 100 Female 6 80 100 Female 7 60 300 Female 8 80 300 Female 9 60 100 non 10 80 100 non 11 60 300 non 12 80 300 non
產(chǎn)生規(guī)則分布的測(cè)試數(shù)據(jù) :? 在統(tǒng)計(jì)學(xué)中,產(chǎn)生隨機(jī)數(shù)據(jù)是很有用的,R可以產(chǎn)生多種不同分布下的隨機(jī)數(shù)序列。 這些分布函數(shù)的形式為rfunc(n,p1,p2,...),其中func指概率分布函數(shù),n為生成數(shù)據(jù)的個(gè)數(shù),p1, p2, . . . 是分布的參數(shù)數(shù)值。 上面的表給出了每個(gè)分布的詳情和可能的缺省值(如果沒有給出缺省值,則意味著用戶必須指定參數(shù))。 大多數(shù)這種統(tǒng)計(jì)函數(shù)都有相似的形式,只需用d、p或者q去替代r ?(見下表),比如 :?
1. 分布函數(shù)的形式為 rfunc(n,p1,p2,...) 2. 密度函數(shù) ( dfunc (x, ...) , 3. 累計(jì)概率密度函數(shù)(也即分布函數(shù))( pfunc (x,...) ) , 4. 分位數(shù)函數(shù)( qfunc (p, ...) , 0 < p < 1) .
最后兩個(gè)函數(shù)序列可以用來求統(tǒng)計(jì)假設(shè)檢驗(yàn)中P值或臨界值。 例如,顯著性水平為5%的正態(tài)分布的雙側(cè)臨界值是 :?
> qnorm(0.025) [1] -1.959964 > qnorm(0.975) [1] 1.959964
對(duì)于同一個(gè)檢驗(yàn)的單側(cè)臨界值,根據(jù)備擇假設(shè)的形式使用qnorm(0.05)或1 -qnorm(0.95) 一個(gè)檢驗(yàn)的P 值,比如自由度df = 1的?2= 3:84 :?
> 1 - pchisq(3.84, 1) [1] 0.05004352
分布名稱 ? ? ? ? ? ? ? ? ? ? ? ? ? 函數(shù)?
Gaussian (normal) rnorm(n, mean=0, sd=1) exponential rexp(n, rate=1) gamma rgamma(n, shape, scale=1) Poisson rpois(n, lambda) Weibull rweibull(n, shape, scale=1) Cauchy rcauchy(n, location=0, scale=1) beta rbeta(n, shape1, shape2) `Student' (t) rt(n, df) Fisher{Snedecor (F ) rf(n, df1, df2) Pearson (?2) rchisq(n, df) binomial rbinom(n, size, prob) multinomial rmultinom(n, size, prob) geometric rgeom(n, prob) hypergeometric rhyper(nn, m, n, k) logistic rlogis(n, location=0, scale=1) lognormal rlnorm(n, meanlog=0, sdlog=1) negative binomial rnbinom(n, size, prob) uniform runif(n, min=0, max=1) Wilcoxon's statistics rwilcox(nn, m, n), rsignrank(nn, n)
[參考]1. help("seq")
Description:Generate regular sequences. ‘seq’ is a standard generic with adefault method. ‘seq.int’ is a primitive which can be much fasterbut has a few restrictions. ‘seq_along’ and ‘seq_len’ are veryfast primitives for two common cases.Usage:seq(...)## Default S3 method:seq(from = 1, to = 1, by = ((to - from)/(length.out - 1)),length.out = NULL, along.with = NULL, ...)seq.int(from, to, by, length.out, along.with, ...)seq_along(along.with)seq_len(length.out)Arguments:...: arguments passed to or from methods.from, to: the starting and (maximal) end values of the sequence. Oflength ‘1’ unless just ‘from’ is supplied as an unnamedargument.by: number: increment of the sequence.length.out: desired length of the sequence. A non-negative number,which for ‘seq’ and ‘seq.int’ will be rounded up iffractional.along.with: take the length from the length of this argument. ....
2. help('gl')
Description:Generate factors by specifying the pattern of their levels.Usage:gl(n, k, length = n*k, labels = seq_len(n), ordered = FALSE)Arguments:n: an integer giving the number of levels.k: an integer giving the number of replications.length: an integer giving the length of the result.labels: an optional vector of labels for the resulting factor levels.ordered: a logical indicating whether the result should be ordered ornot.Value:The result has levels from ‘1’ to ‘n’ with each value replicatedin groups of length ‘k’ out to a total length of ‘length’.‘gl’ is modelled on the _GLIM_ function of the same name.
總結(jié)
以上是生活随笔為你收集整理的generate random or regular test data in R的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Combinations
- 下一篇: 【LeetCode】- Search I