Scraping with R: An Overview
https://drive.google.com/open?id=1DCveWYpwlR4xfbySKVoikgdme5W71Pp57vyxSojr7XE
Open data
An url to an dataset: https://tcgbusfs.blob.core.windows.net/blobyoubike/YouBikeTP.gz
What is web API
系統方將資料庫查詢轉變為一具有參數的網址,或者說,系統方提供一帶參數的網址,讓他自己或者第三方開發者可以從前端便於查詢該資料庫。
情況一:使用者方透過模仿系統從前端獲取後端資料的方法來獲取資料
情況二:系統方透過申請核可和發給Token來提供使用者申請使用其獲取資料方法
Rundown
Overview
What is Crawler?
slide: Crawler designs: Request and response
Types of Web response: An overview
Getting JSON from the Web
Seeking the URL
Reading data: Dcard, cnyes.com
Getting HTML from the web
Seeking the URL
Seeking the data node
Parsing HTML: PTT
Types
JSON as responses
System services provide APIs for 3-party usage
facebook ads library API, https://www.facebook.com/ads/library/api
Twitter API, https://developer.twitter.com/
Instagram API
Rate limitations, the next page data
System services didn’t provide official APIs but use AJAX and API for getting data from back-end database
104.com
Shinyi.com
cnyes.com
pchome search
DCard
rent591.com
HTML as responses
PTTs
Books.com
ltn.com.tw
udn.com.tw
ibon address
Last updated
Was this helpful?