How to fight Coronavirus using Selenium Web Scraping?
On march 11, 2020, the World Health Organization (WHO) declared COVID-19 . It points out the more than 118,000 cases of coronavirus disease. This disease spread in more than 110 countries and territories around the world. And the selenium online training growing possibility of further global spread.
We can use web scraping to correlate data of COVID -19 and take necessary measures to stop it from spreading.
Web scraping:
It is a method to access data from a source website and use the data for data manipulation and operation. Any website contains data that can be viewed through a browser. The websites don’t allow you to have access to the data, the only way to have access to the data is to manually copy and paste the data. This is a tedious task to copy and paste all the data. Instead, we can use the web scraping technique for accessing the data of the website.
Robots.txt:
You cannot just access the data of every website you come across. Some websites don’t permit you to access their data. You cannot access their data if not supported. For example, take www.twitter.com/robots.txt
Web scraping techniques:
Python is the best option when it comes to web scraping and you can use different web libraries for web scraping like selenium, beautiful soap, pandas, etc…
Let us go through selenium python web scraping.
Web scraping process:
- You should make a request using requests module via URL
- Html content should be retrieved as a text.
- Examine HTML and extract data. For performing this task right click with the mouse in the web browser and select the inspect option.
To serve our purpose let us execute web scraping using selenium and python.
Setup selenium:
Selenium is an open source testing software. Download selenium and install it.
Web drivers:
Web drivers let python monitor the browser through interactions at the OS level. Web drivers use the built-in support of the browser for the automation process so that the web driver must be enabled and accessible through the operating system’s PATH variable (only needed for manual installation) to operate the browser.
You download the drivers from respective browsers according to usage like chrome, firefox, safari, etc…
You can learn how to deploy selenium through selenium online training Hyderabad.
There is a “Terminal” tab in the VS Code that allows you to open an internal terminal within the VS Code, which is very useful for getting everything in one place.
There are a few more things we need to add when you have that availability, and that is the virtual environment and selenium for web drivers. Type these commands into your terminal.
- pip3 install virtual lenv
- Source venv/bin/activate
- Pip3 install selenium
The virtual environment setup is complete. We are good to go for the next step.
Executing the code:
You need to create a class and add functions for it.
Name and create the tool and start the driver.
class Coronavirus()def__init__(self)self.driver=webdriver.crome
After executing this code go to the terminal and execute next code.
Python-i coronavirus.py
This command helps us to make our file interactive as a sandbox. The browser’s new tab will be opened, and we can begin to issue commands. You can use the command line instead of just typing it directly into your source code if you want to try. Only instead of using send, use bot.
In terminal:Bot = coronavirus()Bot.driver.get(‘https://www.worldometers.info/coronavirus/')Source code:Self.driver.get(‘https://www.worldometers.info/coronavirus/’)
When we are redirected to the website the table is copied in this way.
Xpath:
XPath is the syntax of an expression path for locating an object in DOM. XPath has its syntax for finding the node from the root element either using an absolute path or using a relative path anywhere in the text.
table = self.driver.find_element_by_xpath(‘//*[@id=”main_table_countries”]/tbody[1]’)
Now reach to the country in the table you want the data initially
country_element = table.find_element_by_xpath(“//td[contains(text(), ‘India’)]”)
Then split the data and view it in different variables
data = row.text.split(“ “)
total_cases = data[1]
new_cases = data[2]
total_deaths = data[3]
new_deaths = data[4]
active_cases = data[5]
total_recovered = data[6]
serious_critical = data[7]
Email alert:
We need to set up the email sending server, go to Google Account Service, go to “App Passwords,” create a new password and use it in this small document.
We are also building our prototype for the email we’ll get.
def send_mail(country_element, total_cases, new_cases, total_deaths, new_deaths, active_cases, total_recovered, serious_critical):
server = smtplib.SMTP(‘smtp.gmail.com’, 587)
server.ehlo()
server.starttls()
server.ehlo()
server.login(‘email’, ‘password’)
subject = ‘Coronavirus stats in your country today!’
body = ‘Today in ‘ + country_element + ‘\
\nThere is new data on coronavirus:\
\nTotal cases: ‘ + total_cases +’\\nNew cases: ‘ + new_cases + ‘\
\nTotal deaths: ‘ + total_deaths + ‘\
\nNew deaths: ‘ + new_deaths + ‘\\nActive cases: ‘ + active_cases + ‘\
\nTotal recovered: ‘ + total_recovered + ‘\\nSerious, critical cases: ‘ + serious_critical + ‘\
\nCheck the link: https://www.worldometers.info/coronavirus/'
msg = f”Subject: {subject}\n\n{body}”
server.sendmail(‘Coronavirus’,’email’,msg)
print(‘Hey Email has been sent!’)server.quit()
By executing this code you will get updates of the corona outbreak as an email alert.
Conclusion:
Hence this web scraping techniques help in analysis of diseases spread across the globe. So I suggest you utilize this tool in order to protect yourself from this dreadful disease for more selenium training.
No comments: