Issue

I am trying to scrape each product page from aliexpress website in order to get number of comments, number of photos published by the custumer and also the custumer country and put it to a dataframe.

I have written a code that scrape custumer country but I don’t know how to get the number of custumer comments and the number of images. This is my code :

from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

url = 'https://www.aliexpress.com/item/1005003801507855.html?spm=a2g0o.productlist.0.0.1e951bc72xISfE&algo_pvid=6d3ed61e-f378-43d0-a429-5f6cddf3d6ad&algo_exp_id=6d3ed61e-f378-43d0-a429-5f6cddf3d6ad-8&pdp_ext_f=%7B%22sku_id%22%3A%2212000027213624098%22%7D&pdp_pi=-1%3B40.81%3B-1%3B-1%40salePrice%3BMAD%3Bsearch-mainSearch'

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get(url)

wait = WebDriverWait(driver, 10)

driver.execute_script("arguments[0].scrollIntoView();", wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.tab-content'))))
driver.get(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#product-evaluation'))).get_attribute('src'))

data=[]

while True:

    for e in driver.find_elements(By.CSS_SELECTOR, 'div.feedback-item'):

        try:
            country = e.find_element(By.CSS_SELECTOR, '.user-country > b').text
        except:
            country = None

       
        data.append({
            'country':country,
       
        })
    try:
        wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#complex-pager a.ui-pagination-next'))).click()
    except:
        break

pd.DataFrame(data).to_csv('filename.csv',index=False)

I would appreciate any help from you! Thank you !

Solution

If you want numbers of comments / reviews, you can just check the value in this section :

driver.find_element(By.XPATH, 'XPATH_OF_ELEMENT_TO_SCRAP')

To do so in your exemple lets do this outside your loop :

number_feedbacks = driver.find_element(By.XPATH, '//*[@id="transction-feedback"]/div[1]')
number_images = driver.find_element(By.XPATH, '//*[@id="transction-feedback"]//label[1]/em')

If you dont understand or know this function, please feal free to ask and I will explain where I found theses XPATH.We also can use find by id function.

In your code it would be :

from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

url = 'https://www.aliexpress.com/item/1005003801507855.html?spm=a2g0o.productlist.0.0.1e951bc72xISfE&algo_pvid=6d3ed61e-f378-43d0-a429-5f6cddf3d6ad&algo_exp_id=6d3ed61e-f378-43d0-a429-5f6cddf3d6ad-8&pdp_ext_f=%7B%22sku_id%22%3A%2212000027213624098%22%7D&pdp_pi=-1%3B40.81%3B-1%3B-1%40salePrice%3BMAD%3Bsearch-mainSearch'

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get(url)

wait = WebDriverWait(driver, 10)

driver.execute_script("arguments[0].scrollIntoView();", wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.tab-content'))))
driver.get(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#product-evaluation'))).get_attribute('src'))

data=[]

number_feedbacks = driver.find_element(By.XPATH, '//*[@id="transction-feedback"]/div[1]')
number_images = driver.find_element(By.XPATH, '//*[@id="transction-feedback"]//label[1]/em')

print(f'number_feedbacks = {number_feedbacks}\nnumber_images = {number_images}')

while True:

    for e in driver.find_elements(By.CSS_SELECTOR, 'div.feedback-item'):

        try:
            country = e.find_element(By.CSS_SELECTOR, '.user-country > b').text
        except:
            country = None

       
        data.append({
            'country':country,
       
        })
    try:
        wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#complex-pager a.ui-pagination-next'))).click()
    except:
        break

pd.DataFrame(data).to_csv('filename.csv',index=False)

Source: [FIXED] How to scrap each product page (comments and custumer country) ~ PythonFixing

Professional Freelancer

How to scrap each product page using selenium to execute script embedded inside

Issue

Solution

Related Post

Web Scraping in Python – How to Scrape an eCommerce Website Using Beautiful Soup and Pandas

Even the most difficult task of collecting Public Data from Facebook

Convert simplified Chinese to traditional Chinese via opencc-python-reimplemented

What kind of IT service do you want? 你在找那一類型的IT服務

Professional Freelancer