在R語言中,擷取特定字串的函數為substr(),經由查詢Help的結果,其對應的描述和用法如下,顧名思義,substr()的用法僅需要定義原字串x,並定義起始字元和終止字元,即可達成效果。
substr():Substrings of a Character Vector
Description:Extract or replace substrings in a character vector.
Usage:substr(x, start, stop)
簡易範例如下,若終止字元大於原字串長度,其終止字元傳回原字串的最末位,如同yy的"lue"結果。
x=c("character", "value","concerned")
y=substr(x,2,4)
yy<-substr(x,3,7)
#輸出結果
> x
[1] "character" "value" "concerned"
> y
[1] "har" "alu" "onc"
> yy
[1] "aract" "lue" "ncern"
然而在數據處理上,可視實際需求並搭配其他函數來進行。例如針對車款的手動檔和自動檔的分類,原始數據的規格表述相當詳細,例如下列數據針對自動檔尚有分為Automatic 3-spd和Automatic 4-spd,但就僅為手動檔和自動檔的分類目的,可簡化為"Auto"和"manual",因此除了善用substr()篩選出字串,並可進一步搭配ifelse()重新定義分類字串,使得數據更加清晰。
x<-data.frame(ori_data$year,ori_data$trany)
y<-ifelse(substr(x$ori_data.trany,1,4)=="Auto","Auto","Manual")
#輸出結果
> x
ori_data.year ori_data.trany
1 1985 Manual 5-spd
2 1985 Manual 5-spd
3 1985 Manual 5-spd
4 1985 Automatic 3-spd
5 1993 Manual 5-spd
6 1993 Automatic 3-spd
7 1993 Manual 5-spd
8 1993 Automatic 3-spd
9 1993 Manual 5-spd
10 1993 Automatic 4-spd
> y
[1] "Manual" "Manual" "Manual" "Auto" "Manual" "Auto" "Manual" "Auto"
[9] "Manual" "Auto"
y<-ifelse(substr(x$ori_data.trany,1,4)=="Auto","Auto","Manual")
#輸出結果
> x
ori_data.year ori_data.trany
1 1985 Manual 5-spd
2 1985 Manual 5-spd
3 1985 Manual 5-spd
4 1985 Automatic 3-spd
5 1993 Manual 5-spd
6 1993 Automatic 3-spd
7 1993 Manual 5-spd
8 1993 Automatic 3-spd
9 1993 Manual 5-spd
10 1993 Automatic 4-spd
> y
[1] "Manual" "Manual" "Manual" "Auto" "Manual" "Auto" "Manual" "Auto"
[9] "Manual" "Auto"
-----如果文章對您有幫助,打開微信掃一掃,請作者喝杯咖啡。-----
沒有留言:
張貼留言