Scraping with R: An Overview

https://drive.google.com/open?id=1DCveWYpwlR4xfbySKVoikgdme5W71Pp57vyxSojr7XE

Open data

What is web API

  • 系統方將資料庫查詢轉變為一具有參數的網址,或者說,系統方提供一帶參數的網址,讓他自己或者第三方開發者可以從前端便於查詢該資料庫。

    • 情況一:使用者方透過模仿系統從前端獲取後端資料的方法來獲取資料

    • 情況二:系統方透過申請核可和發給Token來提供使用者申請使用其獲取資料方法

Rundown

Overview

  1. What is Crawler?

    1. slide: Crawler designs: Request and response

    2. Types of Web response: An overview

  2. Getting JSON from the Web

    1. Seeking the URL

    2. Reading data: Dcard, cnyes.com

  3. Getting HTML from the web

    1. Seeking the URL

    2. Seeking the data node

    3. Parsing HTML: PTT

Types

JSON as responses

System services provide APIs for 3-party usage

System services didn’t provide official APIs but use AJAX and API for getting data from back-end database

  • 104.com

  • Shinyi.com

  • cnyes.com

  • pchome search

  • DCard

  • rent591.com

HTML as responses

  • PTTs

  • Books.com

  • ltn.com.tw

  • udn.com.tw

  • ibon address

Last updated

Was this helpful?