在數據處理中,去除並取得非重複的數組是一項基本需求。在R語言中,可以用unique函數實現。
簡易範例如下:
x <- c(8-1:5, 4:8, 6+0:5)
y <- unique(x)
#輸出結果
> x
[1] 7 6 5 4 3 4 5 6 7 8 6 7 8 9 10 11
> y
[1] 7 6 5 4 3 8 9 10 11
y <- unique(x)
#輸出結果
> x
[1] 7 6 5 4 3 4 5 6 7 8 6 7 8 9 10 11
> y
[1] 7 6 5 4 3 8 9 10 11
在實際數據操作中,例如一個汽車出廠年份和油耗效率的數據集,為了解該數據集統計的年份跨度,亦可善用該函數,範例如下,數據可看出年份跨度為1984~2021年,同時若仔細觀察,可發現沒有1986年的數據,可進一步用排序方式檢查得知。
#data read
ori_data<-read.csv("vehicles.csv",header = T,stringsAsFactors = F)
#Check "year" of vehicles concerned
year_data<-unique(ori_data[,"year"])
first_year<-min(unique(ori_data[,"year"]))
last_year<-max(unique(ori_data[,"year"]))
#輸出結果
> ori_data$year
[1] 1985 1985 1985 1985 1993 1993 1993 1993 1993 1993 1993 1993 1993 1993 1985
[16] 1993 1993 1993 1993 1993 1993 1993 1993 1993 1993 1985 1993 1993 1993 1993
[31] 1993 1993 1993 1993 1993 1993 1985 1993 1993 1993 1993 1993 1993 1993 1993
[46] 1993 1993 1985 1993 1993 1993 1993 1993 1993 1993 1993 1993 1993 1985 1993
[61] 1993 1993 1993 1993 1993 1993 1993 1993 1993 1985 1993 1993 1993 1993 1993
[76] 1993 1993 1993 1993 1993 1985 1993 1993 1993 1993 1993 1993 1993 1993 1993
[91] 1993 1985 1993 1993 1993 1993 1993 1993 1993 1993 1993 1993 1985 1993 1993
[106] 1993 1993 1993 1993 1993 1993 1993 1993 1985 1985 1993 1993 1993 1993 1993
[121] 1993 1993 1993 1993 1993 1985 1993 1993 1993 1993 1993 1993 1993 1993 1993
[136] 1993 1985 1993 1993 1993 1993 1993 1993 1993 1993 1993 1993 1985 1993 1993
[151] 1993 1993 1993 1993 1993 1993 1993 1993 1985 1993 1993 1993 1993 1993 1993
[166] 1993 1993 1993 1993 1985 1993 1993 1993 1993 1993 1993 1993 1993 1993 1993
[181] 1985 1993 1993 1993 1993 1993 1993 1993 1993 1993 1993 1985 1993 1993 1993
[196] 1993 1993 1993 1993 1993 1993 1993 1985 1993 1993 1993 1993 1993 1993 1993
[211] 1993 1993 1993 1985 1993 1993 1993 1993 1993 1993 1993 1993 1993 1993 1985
[ reached getOption("max.print") -- omitted 42317 entries ]
> year_data
[1] 1985 1993 1994 1995 1996 1997 1998 1999 2000 2001 1986 2002 2003 2004
[15] 2005 2006 2007 2008 2009 2010 1984 1987 1988 1989 1990 1991 1992 2011
[29] 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
> first_year
[1] 1984
> last_year
[1] 2021
ori_data<-read.csv("vehicles.csv",header = T,stringsAsFactors = F)
#Check "year" of vehicles concerned
year_data<-unique(ori_data[,"year"])
first_year<-min(unique(ori_data[,"year"]))
last_year<-max(unique(ori_data[,"year"]))
#輸出結果
> ori_data$year
[1] 1985 1985 1985 1985 1993 1993 1993 1993 1993 1993 1993 1993 1993 1993 1985
[16] 1993 1993 1993 1993 1993 1993 1993 1993 1993 1993 1985 1993 1993 1993 1993
[31] 1993 1993 1993 1993 1993 1993 1985 1993 1993 1993 1993 1993 1993 1993 1993
[46] 1993 1993 1985 1993 1993 1993 1993 1993 1993 1993 1993 1993 1993 1985 1993
[61] 1993 1993 1993 1993 1993 1993 1993 1993 1993 1985 1993 1993 1993 1993 1993
[76] 1993 1993 1993 1993 1993 1985 1993 1993 1993 1993 1993 1993 1993 1993 1993
[91] 1993 1985 1993 1993 1993 1993 1993 1993 1993 1993 1993 1993 1985 1993 1993
[106] 1993 1993 1993 1993 1993 1993 1993 1993 1985 1985 1993 1993 1993 1993 1993
[121] 1993 1993 1993 1993 1993 1985 1993 1993 1993 1993 1993 1993 1993 1993 1993
[136] 1993 1985 1993 1993 1993 1993 1993 1993 1993 1993 1993 1993 1985 1993 1993
[151] 1993 1993 1993 1993 1993 1993 1993 1993 1985 1993 1993 1993 1993 1993 1993
[166] 1993 1993 1993 1993 1985 1993 1993 1993 1993 1993 1993 1993 1993 1993 1993
[181] 1985 1993 1993 1993 1993 1993 1993 1993 1993 1993 1993 1985 1993 1993 1993
[196] 1993 1993 1993 1993 1993 1993 1993 1985 1993 1993 1993 1993 1993 1993 1993
[211] 1993 1993 1993 1985 1993 1993 1993 1993 1993 1993 1993 1993 1993 1993 1985
[ reached getOption("max.print") -- omitted 42317 entries ]
> year_data
[1] 1985 1993 1994 1995 1996 1997 1998 1999 2000 2001 1986 2002 2003 2004
[15] 2005 2006 2007 2008 2009 2010 1984 1987 1988 1989 1990 1991 1992 2011
[29] 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
> first_year
[1] 1984
> last_year
[1] 2021
-----如果文章對您有幫助,打開微信掃一掃,請作者喝杯咖啡。-----
沒有留言:
張貼留言