linux 获取命令行返回的数据_Linux | 活用CLI命令行进行数据处理与探索
COMMAND LINE INTERFACE
CLI即COMMAND LINE INTERFACE命令行模式,是在圖形界面得到普及之前使用最為廣泛的用戶界面,也就是咱們平時(shí)看到的黑乎乎背景與綠油油文字系統(tǒng)界面。
我們常常習(xí)慣于使用圖形界面(excel或juypter)對(duì)數(shù)據(jù)進(jìn)行復(fù)雜操作和分析,但是面對(duì)一些文本處理可以通過(guò)一些命令更為方便快捷,本文旨在一些數(shù)據(jù)處理場(chǎng)景進(jìn)行基礎(chǔ)介紹,命令的具體用法和參數(shù)還請(qǐng)讀者自行谷歌。
獲取數(shù)據(jù)
Linux中 curl是一個(gè)利用URL規(guī)則在命令行下工作的文件傳輸工具,可以說(shuō)是一款很強(qiáng)大的http命令行工具。它支持文件的上傳和下載,是綜合傳輸工具,但按傳統(tǒng),習(xí)慣稱url為下載工具。
wget工具體積小但功能完善,它支持?jǐn)帱c(diǎn)下載功能,同時(shí)支持FTP和HTTP下載方式,支持代理服務(wù)器和設(shè)置起來(lái)方便簡(jiǎn)單。
# 通過(guò)鏈接下載數(shù)據(jù) wget https://github.com/amanthedorkknight/fifa18-all-player-statistics/raw/master/2019/data.csv curl -s http://www.gutenberg.org/files/76/76-0.txt轉(zhuǎn)化數(shù)據(jù)
in2csv是csvkit套件的一個(gè)重要工具,其作用是可以將各種格式化數(shù)據(jù)(如excel)轉(zhuǎn)化為csv文件。由于unix系統(tǒng)下無(wú)法直接查看excel文件,所以需要將xlsx提前轉(zhuǎn)化為csv文件即逗號(hào)分隔文件。
# 將xlsx文件轉(zhuǎn)化為csv文件,>為重定向符號(hào),可將前一個(gè)命令輸出保存為文件。 in2csv data/imdb-250.xlsx > data/imdb-250.csv查看數(shù)據(jù)
cat是最常用的文本查看命令,但是只能全量查看,面對(duì)較大的數(shù)據(jù)需要配合head或tail查看前n行或后n行文本。
# 只查看前1行即喵一眼列名 head -n $ head -1 data.csv ,ID,Name,Age,Photo,Nationality,Flag,Overall,Potential,Club,Club Logo,Value,Wage,Special,Preferred Foot,International Reputation,Weak Foot,Skill Moves,Work Rate,Body Type,Real Face,Position,Jersey Number,Joined,Loaned From,Contract Valid Until,Height,Weight,LS,ST,RS,LW,LF,CF,RF,RW,LAM,CAM,RAM,LM,LCM,CM,RCM,RM,LWB,LDM,CDM,RDM,RWB,LB,LCB,CB,RCB,RB,Crossing,Finishing,HeadingAccuracy,ShortPassing,Volleys,Dribbling,Curve,FKAccuracy,LongPassing,BallControl,Acceleration,SprintSpeed,Agility,Reactions,Balance,ShotPower,Jumping,Stamina,Strength,LongShots,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Marking,StandingTackle,SlidingTackle,GKDiving,GKHandling,GKKicking,GKPositioning,GKReflexes,Release Clause# 只查看后5行 tail -n $ tail -5 data.csv 18202,238813,J. Lundstram,19,https://cdn.sofifa.org/players/4/19/238813.png,England,https://cdn.sofifa.org/flags/14.png,47,65,Crewe Alexandra,https://cdn.sofifa.org/teams/2/light/121.png,€60K,€1K,1307,Right,1,2,2,Medium/ Medium,Lean,No,CM,22,"May 3, 2017",,2019,5'9,134lbs,42+2,42+2,42+2,44+2,44+2,44+2,44+2,44+2,45+2,45+2,45+2,44+2,45+2,45+2,45+2,44+2,44+2,45+2,45+2,45+2,44+2,45+2,45+2,45+2,45+2,45+2,34,38,40,49,25,42,30,34,45,43,54,57,60,49,76,43,55,40,47,38,46,46,39,52,43,45,40,48,47,10,13,7,8,9,€143K 18203,243165,N. Christoffersson,19,https://cdn.sofifa.org/players/4/19/243165.png,Sweden,https://cdn.sofifa.org/flags/46.png,47,63,Trelleborgs FF,https://cdn.sofifa.org/teams/2/light/703.png,€60K,€1K,1098,Right,1,2,2,Medium/ Medium,Normal,No,ST,21,"Mar 19, 2018",,2020,6'3,170lbs,45+2,45+2,45+2,39+2,42+2,42+2,42+2,39+2,40+2,40+2,40+2,38+2,35+2,35+2,35+2,38+2,30+2,31+2,31+2,31+2,30+2,29+2,32+2,32+2,32+2,29+2,23,52,52,43,36,39,32,20,25,40,41,39,38,40,52,41,47,43,67,42,47,16,46,33,43,42,22,15,19,10,9,9,5,12,€113K 18204,241638,B. Worman,16,https://cdn.sofifa.org/players/4/19/241638.png,England,https://cdn.sofifa.org/flags/14.png,47,67,Cambridge United,https://cdn.sofifa.org/teams/2/light/1944.png,€60K,€1K,1189,Right,1,3,2,Medium/ Medium,Normal,No,ST,33,"Jul 1, 2017",,2021,5'8,148lbs,45+2,45+2,45+2,45+2,46+2,46+2,46+2,45+2,44+2,44+2,44+2,44+2,38+2,38+2,38+2,44+2,34+2,30+2,30+2,30+2,34+2,33+2,28+2,28+2,28+2,33+2,25,40,46,38,38,45,38,27,28,44,70,69,50,47,58,45,60,55,32,45,32,15,48,43,55,41,32,13,11,6,5,10,6,13,€165K 18205,246268,D. Walker-Rice,17,https://cdn.sofifa.org/players/4/19/246268.png,England,https://cdn.sofifa.org/flags/14.png,47,66,Tranmere Rovers,https://cdn.sofifa.org/teams/2/light/15048.png,€60K,€1K,1228,Right,1,3,2,Medium/ Medium,Lean,No,RW,34,"Apr 24, 2018",,2019,5'10,154lbs,47+2,47+2,47+2,47+2,46+2,46+2,46+2,47+2,45+2,45+2,45+2,46+2,39+2,39+2,39+2,46+2,36+2,32+2,32+2,32+2,36+2,35+2,31+2,31+2,31+2,35+2,44,50,39,42,40,51,34,32,32,52,61,60,52,21,71,64,42,40,48,34,33,22,44,47,50,46,20,25,27,14,6,14,8,9,€143K 18206,246269,G. Nugent,16,https://cdn.sofifa.org/players/4/19/246269.png,England,https://cdn.sofifa.org/flags/14.png,46,66,Tranmere Rovers,https://cdn.sofifa.org/teams/2/light/15048.png,€60K,€1K,1321,Right,1,3,2,Medium/ Medium,Lean,No,CM,33,"Oct 30, 2018",,2019,5'10,176lbs,43+2,43+2,43+2,45+2,44+2,44+2,44+2,45+2,45+2,45+2,45+2,46+2,45+2,45+2,45+2,46+2,46+2,46+2,46+2,46+2,46+2,46+2,47+2,47+2,47+2,46+2,41,34,46,48,30,43,40,34,44,51,57,55,55,51,63,43,62,47,60,32,56,42,34,49,33,43,40,43,50,10,15,9more與less對(duì)文件進(jìn)行動(dòng)態(tài)加載并進(jìn)行滾屏查看,在查看超大文件時(shí)又快又靈活,墻裂推薦。
快捷鍵:ctrl+F向后一屏,ctrl+B向前一屏,j/k前/后一行,g文本首航,G文本末行,/查找。
# 動(dòng)態(tài)查看文本,顯示行號(hào)與文本百分比 $ less -m -N data.csv1 ,ID,Name,Age,Photo,Nationality,Flag,Overall,Potential,Club,Club Logo,Value,Wage,Special,Preferred Foot,International Reputation,Weak Foot,Skill Moves,Work Rate,Body2 0,158023,L. Messi,31,https://cdn.sofifa.org/players/4/19/158023.png,Argentina,https://cdn.sofifa.org/flags/52.png,94,94,FC Barcelona,https://cdn.sofifa.org/teams/2/l3 1,20801,Cristiano Ronaldo,33,https://cdn.sofifa.org/players/4/19/20801.png,Portugal,https://cdn.sofifa.org/flags/38.png,94,94,Juventus,https://c 0%操作文本
linux有文本處理三劍客,grep文本過(guò)濾小能手,sed增刪查改小幫手,awk格式化處理王者。
- grep 文本過(guò)濾小能手,配合正則表達(dá)式,來(lái)過(guò)濾出一些符合我們條件的行。
- sed增刪查改小幫手,一行一行處理文本。
- awk格式化處理王者
awk [options] 'pattern{action}'
awk可以根據(jù)pattern對(duì)想要處理的行進(jìn)行匹配,匹配之后再對(duì)其進(jìn)行文本操作,適用以行為單位對(duì)文本進(jìn)行格式化處理。
# -F設(shè)置,為分隔符, $n代表分隔后的第n個(gè)字段 # 每行以,號(hào)分隔為列,打印第一列和第三列 ? ~ awk -F ',' '{print $1,$3}' data1.csv1 姓名 愛(ài)好2 蔡徐坤 籃球3 謝廣坤 收貨4 楊坤 開(kāi)演唱會(huì)5 陳坤 尋寶# BEGIN和END用于標(biāo)記位置 # 給文本加上標(biāo)題,結(jié)尾加上時(shí)間 ? ~ awk -F ',' 'BEGIN{print "t標(biāo)題:人物信息"} {print $1,$3} END {print "ttt---2019年04月14日"}' data1.csv標(biāo)題:人物信息1 姓名 愛(ài)好2 蔡徐坤 籃球3 謝廣坤 收貨4 楊坤 開(kāi)演唱會(huì)5 陳坤 尋寶---2019年04月14日# 加入label為變量,if else用于條件判斷 # 加入新列, 年齡大于40的標(biāo)記為老戲骨,其余為小鮮肉 ? ~ awk -F ',' '{if($2>40){label="老戲骨"}else{label="小鮮肉"};print $0,label}' data1.csv1 姓名,年齡,愛(ài)好,職業(yè) 老戲骨2 蔡徐坤,21,籃球,校隊(duì)球員 小鮮肉3 謝廣坤,64,收貨,象牙山紀(jì)律委員 老戲骨4 楊坤,47,開(kāi)演唱會(huì),歌手 老戲骨- tr替換,例如替換逗號(hào)分隔符為制表符。
tr ',' 't' data.csv
- cut指定字段范圍顯示,例如顯示前3列文本。
cut -c-3 data.csv
CSV CSV CSV !!!
CSV是咱們數(shù)據(jù)民工每天面對(duì)的難題,如何應(yīng)用命令行靈活處理CSV呢?
csvlook可用于查看csv文件并格式化,--max-rows設(shè)置行數(shù),--max-columns設(shè)置列數(shù) 。
$ csvlook wine-red.csv --max-rows 5 --max-columns 5 | fixed acidity | volatile acidity | citric acid | residual sugar | chlorides | ... | | ------------- | ---------------- | ----------- | -------------- | --------- | --- | | 7.4 | 0.70 | 0.00 | 1.9 | 0.076 | ... | | 7.8 | 0.88 | 0.00 | 2.6 | 0.098 | ... | | 7.8 | 0.76 | 0.04 | 2.3 | 0.092 | ... | | 11.2 | 0.28 | 0.56 | 1.9 | 0.075 | ... | | 7.4 | 0.70 | 0.00 | 1.9 | 0.076 | ... | | ... | ... | ... | ... | ... | ... |表格通常由表頭與數(shù)據(jù)組成,橫向分隔為列,縱向分隔為行。而與之對(duì)應(yīng)的也有三大處理工具,body(Janssens2014a),header(Janssens2014c),andcols(Janssens2014b)。
# 顯示列名 $ cat wine-red.csv | header "fixed acidity";"volatile acidity";"citric acid";"residual sugar";"chlorides";"free sulfur dioxide";"total sulfur dioxide";"density";"pH";"sulphates";"alcohol";"quality"# 除了列名將所有小寫字母替換為大寫字母 < tips.csv body "tr '[a-z]' '[A-Z]'" | head -n 5 | csvlook | bill | tip | sex | smoker | day | time | size | | ----- | ---- | ------ | ------ | ---------- | ------ | ---- | | 16.99 | 1.01 | FEMALE | False | 0001-01-07 | DINNER | 2 | | 10.34 | 1.66 | MALE | False | 0001-01-07 | DINNER | 3 | | 21.01 | 3.50 | MALE | False | 0001-01-07 | DINNER | 3 | | 23.68 | 3.31 | MALE | False | 0001-01-07 | DINNER | 2 |- csvstat統(tǒng)計(jì)列最大,最小,均值,unique等統(tǒng)計(jì)值。
最后給大家替工一個(gè)docker image,免去安裝各種依賴環(huán)境的繁瑣。
$ docker pull datascienceworkshops/data-science-at-the-command-line $ docker run --rm -it datascienceworkshops/data-science-at-the-command-line相關(guān)資料
Data Science at the Command LineCsvkit Document TutorialLinux三劍客之a(chǎn)wk命令awk從入門到放棄
What is the difference between sed and awk?
作者才疏學(xué)淺,文章中可能出現(xiàn)紕漏,忘讀者多多指正交流。
各位點(diǎn)一波贊吧~~
總結(jié)
以上是生活随笔為你收集整理的linux 获取命令行返回的数据_Linux | 活用CLI命令行进行数据处理与探索的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 信息学奥赛一本通 1165:Hermit
- 下一篇: pl sql mysql 版本_pl s