Looking for a junior developer which wants to learn and improve their web scraping and back-end development skills. The developer will have full autonomy on the project, but at the same time will receive support and guidance from a senior developer.
This project is a content aggregator, which receives parsing requests from a queue (Amazon SQS). The requests can have three forms.
- a sitemap (XML), in which case the app has to crawl all the links in the sitemap and gather their content and metadata;
- an RSS feed, in which case the app also crawls all the links in the feed as in the sitemap case;
- a specific URL, in which case the app only crawls that specific page and no other linked pages.
From the crawled pages, the app needs to gather some information, like the page title, the final location in case of redirects, the content of meta tags, the schema.org json data, a representative image (will be picked based on specific criteria), the age of the content (we'll discuss some techniques for determining the age of the content) and a few other data points that we'll discuss in private.
Also, from some of the crawled pages, based on certain criteria, the app will need to gather extra information, some from the page itself, other from sources like Lighthouse, grab a screenshot with Selenium (also an API) and fetch data from other external APIs.
The data collected needs to be stored in a database and in some instances, compared with the data already available.
The same project also needs a frontend interface (we already have a coded design for it). It will allow content from the database to be browsed and searched, based on specific criteria.
Skills that are not mandatory but would be nice to have (you'll learn them on the job):
- Symfony 5 experience;
- PostgreSQL;
- Docker;
- API integration;
- API development;
- data scraping;
- cURL and similar tools;
- Selenium;
- AWS;
- git and git workflows;
The tech stack is PostgreSQL for database, PHP8 with Symfony 5.3 for backend, Docker for development and testing, Sentry for error management, GitHub for versioning and AWS for deployment. It's not negotiable, but as mentioned above, if you want to learn this stack, don't be afraid to apply without prior experience with it.
Although the project is complex and some tasks will be over a junior's capabilities, we're looking specifically for juniors that want to improve their skills and gain experience on real life projects. The deadlines and work hours are flexible and you'll be able to ask for help from a senior at any point.
Posted On: June 02, 2021 12:38 UTC Category: Back-End Development Skills:PostgreSQL, Laravel, CodeIgniter, PHP, Symfony, MySQL, HTTP, Web Crawling, Web Scraper, Web Crawler, Selenium, API, Yii2
Skills: PostgreSQL, Laravel, CodeIgniter, PHP, Symfony, MySQL, HTTP, Web Crawling, Web Scraper, Web Crawler, Selenium, API, Yii2 Country: Romania
click to apply
Project ID:
3176020
Project category:
PostgreSQL, Laravel, CodeIgniter, PHP, Symfony, MySQL, HTTP, Web Crawling, Web Scraper, Web Crawler, Selenium, API, Yii2