Part 3 – Have 75 Magazines to Flip an Article to and Tired of the Wash, Rinse, and Repeat? Exploring Web Automation, Web Scraping, and Selenium vs BeautifulSoup

Part 3 – Have 75 Magazines to Flip an Article to and Tired of the Wash, Rinse, and Repeat? Exploring Web Automation, Web Scraping, and Selenium vs BeautifulSoup

Once again, I do not want to assume everything works the same for each industry. I am not fortunate enough to place a single article in a magazine and have it flourish. I have to put it in every magazine I possibly can. This was my biggest frustration when I joined Flipboard. So, for me, when I want to post an article to a magazine, I have to click on create a flip, insert link, write something interesting up top, then select a magazine and click add. So obviously, in my mind, I thought it was crazy to have to repeat that process for every magazine you wanted to post to. Why not do it once and then pop up a box with all your magazines and check all the ones you want to post to?

Let’s face it, this is a feature Flipboard could easily implement. I can’t pretend to know the answer why they don’t. So, like most people without answers, I’m left with conspiracy theories. It could be as simple as Flipboard feeling it needs user involvement for their concept to work or as dubious as creating an unlevel playing field.

Everyone is familiar with Captchas; A websites attempt to make sure you are human. This probably makes most people think of bots as something nefarious. Yes, like most things created for good, they can be used for evil. We are simply looking at web automation to do simple tedious tasks for us. For example, flipping your post to multiple magazines. I am going to assume that Parade Magazine isn’t manually flipping their material. I would assume they use RSS. This is the method I would assume Flipboard would hopefully eventually implement. RSS feeds are a powerful tool for automating the distribution and consumption of web content, enabling users and systems to stay updated with minimal manual intervention. So, for now, without that as an option, we can look at automating the process ourselves or paying someone to do it. In order for us to be able to do this, we must be able to extract data from the website. The first method we would want to use is via an API Key. By using an API when available, you leverage a more robust, efficient, and legally compliant method to access the data you need. Unfortunately, I do not believe Flipboard is providing its users with an API Key. This is where web scraping comes in and how Selenium and BeautfulSoup can achieve this.  Selenium is an open-source automation testing tool that supports multiple programming languages. It allows you to simulate user interactions with a web page, such as clicking buttons, filling out forms, and navigating through pages. On the other hand, BeautifulSoup is a Python library used for parsing HTML and XML documents. It creates a parse tree that makes it easy to extract data from HTML Combining Selenium and BeautifulSoup allows you to leverage the strengths of both tools, making it possible to scrape complex and dynamic websites efficiently.

I apologize, this code is a hot mess. The purpose of this program can be found in the sub routine parse recipe. With a given URL I want the title, description and image. I use Selenium to load the page and extract the HTML and parse the extracted HTML with BeautifulSoup to get the title, description and image URL. The rest of the program is me inputting a text file with websites I use and then writing to a text file with the title and description and saving the downloaded images to a folder. The reason for the overuse of error checking is the fact that sometimes images are replaced with a social media copy. This is an issue you wouldn’t have with an API Key. Also, keep in mind if the structure of a web page changes, most likely your code will need to change. This is why API is more stable. There are places to get free API Keys with limited usage. Microlink API and NewsAPI for example.

Johnny Watson

Hi, my name is Johnny Watson and I am currently helping promote food & drink recipes on Flipboard. I also have a passion for computer science and the ins and outs of food on our health. So both health and technology coming soon.