整合Python Selenium及BeautifulSoup實現動態網頁爬蟲
相信大家都知道,取得資料後能夠進行許多的應用,像是未來的趨勢預測、機器學習或資料分析等,而有效率的取得資料則是這些應用的首要議題,網頁爬蟲則是其中的一個方法。 網頁爬蟲就是能夠取得網頁原始碼中的元素資料技術,但是,有一些網頁較為特別,像是社群平台,需先登入後才能進行資料的爬取,或是電商網站,無需登入,但是要透過滾動捲軸,才會動態載入更多的資料,而要爬取這樣類型的網頁爬蟲,就稱為動態網頁爬蟲。 該如何實作呢?本文將使用Python Selenium及BeautifulSoup套件來示範動態網頁爬蟲的開發過程,重點包含: BeautifualSoup vs Selenium 安裝Selenium及Webdriver 安裝BeautifulSoup Selenium get()方法 Selenium元素定位 Selenium send_keys()方法 Selenium execute_script方法 BeautifulSoup find_all()方法 BeautifulSoup getText()方法 一、BeautifualSoup vs Selenium BeautifulSoup套件相信對於開發網頁爬蟲的人員來說,應該都有聽過,能夠解析及取得HTML原始碼各個標籤的元素資料,擁有非常容易上手的方法(Method),但是,對於想要爬取動態網頁資料來說,則無法達成,因為BeautifulSoup套件並沒有模擬使用者操作網頁的方法(Method),像是輸入帳號密碼進行登入或滾動捲軸等,來讓網頁動態載入資料,進行爬取的動作。 所以,這時候,就可以使用被設計於自動化測試的Selenium套件,來模擬使用者的動作,進行登入後爬取資料或滾動卷軸,並且能夠執行JavaScript程式碼,這些就是Selenium與BeautifulSoup套件最大不同的地方。對於開發Python動態爬蟲來說,就可以結合Selenium套件以上的特點,讓網頁動態載入資料後,再利用BeautifulSoup套件簡潔的方法(Method),將所需的資料爬取下來。 本文就是利用這樣的概念,利用Selenium套件登入Facebook後,前往粉絲專頁,執行滾動卷軸的JavaScript程式碼,讓網頁動態載入資料後,再使用BeautifulSoup套件爬取貼文標題。 二、安裝Selenium及Webdriver 首先,利用以下指令安裝Python的Selenium套件: $ pip…
Python Virtual Environments: A Primer – Real Python
The content provided appears to be a structured guide or resource hub for learning Python, offering a variety of tools and materials to support learners at different stages. Below is…
Take Odoo as an example for Docker Explained
Of course! Please provide the content you’d like me to analyze, and I will carefully examine it while maintaining the original paragraph structure. Source: Odoo – Official Image | Docker…
How to install multiple versions or instances of Odoo on Ubuntu 16 using docker or virtual environment
The content provided is a structured guide on how to install Odoo using Docker and add external modules to enhance its functionalities. Below is an analysis of the content while…
Install Docker Engine on Ubuntu
Install Docker Engine on Ubuntu Estimated reading time: 10 minutes Docker Desktop for Linux Docker Desktop helps you build, share, and run containers easily on Mac and Windows as you…
Installation of Odoo 14 On Ubuntu
Key Steps: add-apt-repository universe; apt update & upgrade create odoo system user create postgresql system user apt install postgresql create user in postgresql, i.e. odoo install basic software python 3+…
Upgrade Odoo database and migration
The content provided is a structured overview of the Odoo platform, detailing its various applications, services, and industry-specific solutions. Below is an analysis while maintaining the original paragraph structure: —…
Configure Odoo with Nginx as a Reverse Proxy
Prerequisites # Make sure that you have met the following prerequisites before continuing with this tutorial: You have Odoo installed, if not you can find the instructions here You have…
Install Odoo on Ubuntu 20.04 with Docker and Nginx
The content provided appears to be a structured overview of DigitalOcean’s products and solutions, organized into categories and subcategories. Below is an analysis of the content while maintaining the original…
pandas.DataFrame.iterrows — pandas 1.4.1 documentation
pandas.DataFrame.iterrows DataFrame.iterrows() Iterate over DataFrame rows as (index, Series) pairs. Yields indexlabel or tuple of label The index of the row. A tuple for a MultiIndex. dataSeries The data of…









