Web Scraping using Selenium with Python

John Doe ·

156 Views

1. Install Geckodriver

Go to the geckodriver releases page. Find the latest version of the driver for your platform and download it. For example:

wget https://github.com/mozilla/geckodriver/releases/download/v0.35.0/geckodriver-v0.35.0-linux64.tar.gz

Extract the file with:

tar -xvzf geckodriver*

Make it executable:

chmod +x geckodriver

Add the driver to your PATH so other tools can find it:

sudo mv geckodriver /usr/local/bin/

 

2. Install Selenium

pip install selenium

 

3. Sample code

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import FirefoxOptions
class TestView(View):
    def post(self, request, *args, **kwargs):
        query = kwargs["query"]
        url = f"https://www.vought.com/search?q={query}"
        opts = FirefoxOptions()
        opts.add_argument("--headless")
        browser = webdriver.Firefox(options=opts, executable_path=settings.SELENIUM_PATH)
        browser.get(url)
        html = browser.page_source
        sor = BeautifulSoup(html, "html.parser")
        target = sor.find("div", {"id": "x"})
        text = target.text
        browser.quit()
        return JsonResponse({"result": text}, json_dumps_params={'ensure_ascii': False}, safe=False
                            , content_type=u"application/json; charset=utf-8")

* SELENIUM_PATH Examples

Linux : /usr/local/bin/geckodriver

Windows : B:\path\to\geckodriver-v0.30.0-win64\geckodriver.exe

 

Ref.

I use Selenium in Python, I tried to run the webdriver function: default_browser = webdriver.Firefox() This Exception: WebDriverException: Message: 'geckodriver' executable needs to be in PATH.

 

selenium BeautifulSoup