1001 Freelance Projects
Latest Projects from Freelance Marketplaces
Today is: 04-May-2024 19:20 GMT
View Project
View this project in detail (Note: you will be redirected to external marketplace)
Project title: Fix error in Python-based web scraper wit GUI
Posted by: External project from PeoplePerHour
Started: 24-Apr-2024 19:27 GMT
Description: Hello Freelancers,
I'm searching for a developer familiar with web scraping and Python to fix an existing web scraper which scrapes product data from products from category links from italian ecommerce-website www.yeppon.it

The script basically works, but it gives an error on certain points when scraping, I think because of a light change in the structure of the website which causes an error when the script tries to scrape a products text description.

Goal of this project is to fix the errors so the script works like it used to again, scraping data of products from given category-URLs from the website and giving out the data in csv-files. I think this won't be too much of an effort because it is basically this one error which needs to be located and fixed, everything else still seems to work fine. Price can be discussed.

Some facts:

1. The web scraper is based on Python with a GUI. It's final version comes as an exe file (therefore I can't attach it in the project description, I will send it in the messages or work stream).
2. It scrapes certain product data (like product name, price, description, image links) by category links which can be entered into the GUI. The GUI also has some input fields, these are just for fixed strings which can be entered into the fields and will be given out in the CSV files the script gives the product data in.
3. The scraper technically still works, however, it gives an error when scraping certain categories. You can check this by running the tool, filling out the given input fields with the data explained in the "Instructions" tab of the tool and then start scraping. It will produce this error (can be found in the log file):

--------------------------------------------------------------------------------------------------------------
2024-04-24 12:44:24,765:ERROR:'descriptionHtml'
Traceback (most recent call last):
File "async_scraper.py", line 739, in scrape
description_html = pdata["pageProps"]["product"][
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'descriptionHtml'
2024-04-24 12:44:24,765:INFO: Finally
2024-04-24 12:44:24,765:INFO:


No data found!


------------------------------------------------------------------------------------------------------------------
4. The error seems to occur when scraping a products text description. The text description consists of three possible elements:
- bulletpoints formatted into an ul element
- a text description which is cleaned/has HTML code removed/replaced
- scraping data from a table on the website and putting it into a given HTML structure



It was developed by a freelancer from PPH for a colleague of mine, unfortunately I can't reach my colleague for quite some time now to ask for all the details or the freelancers name, so I will post this to the public.


Scraping some categories will result in the error mentioned above, for example:
https://www.yeppon.it/c/elettrodomestici/grandi-elettrodomestici/asciugabiancheria
or
https://www.yeppon.it/c/elettrodomestici/grandi-elettrodomestici/frigoriferi

Others work just fine, like:
https://www.yeppon.it/c/telefonia/smartphone/smart-phone




I will attach the files I have about this project from my colleague. As I can't attach exe or rar files, I attached:

- a first version of the Python code (it is a beta version which will give another error which is solved in the final exe file and not the final code, just to give you an impression), as well as the code of the GUI and the requirements. These are async_scraper.txt, gui.txt and requirements.txt
Project ID: 3382569
Project category:
Project budget:
View this project in detail (Note: you will be redirected to external marketplace)
Last Projects / Browse Projects
  Project Started
Mixed Data Verification Expert
Category: Data Entry, Data Mining, Data Processing, Excel, Typing
Budget: ₹750 - ₹1250 INR
04-May-2024
16:04 GMT
Skilled React Native Developer Needed : Must Have Mac and 4+ Years of Experience
Category: Android, IPhone, Mobile App Development, React Native, Swift
Budget: ₹37500 - ₹75000 INR
04-May-2024
16:04 GMT
Move website from subdomain to live server in whm
Category: HTML, MySQL, PHP, Web Design, WordPress
Budget: $10 - $15 AUD
04-May-2024
16:02 GMT
SEO & Social Media Specialist Needed
Category: Facebook Marketing, Internet Marketing, Link Building, SEO, Social Media Marketing
Budget: $30 - $250 USD
04-May-2024
16:01 GMT
Automate PDF form-filling in Java
Category: Java, Python
Budget: $30 - $250 USD
04-May-2024
16:01 GMT
Manga Artist for Adventure Comic Collaboration
Category: Caricature & Cartoons, Graphic Design, Illustration, Visual Arts
Budget: ₹600 - ₹1500 INR
04-May-2024
16:01 GMT
Classic Red Logo with Business Symbol
Category: Graphic Design, Illustration, Logo Design, Photoshop
Budget: $8 - $15 USD
04-May-2024
16:01 GMT
React Native App Development Needed
Category: Mobile App Development, React Native
Budget: $30 - $250 USD
04-May-2024
16:00 GMT
React native node js dev ( Tamil Dev only) Read desc fully -- 2
Category: AngularJS, Express JS, HTML5, JavaScript, Node.js
Budget: ₹12500 - ₹37500 INR
04-May-2024
16:00 GMT
3D MODELLING
Category: 3D Modelling, 3D Rendering, 3ds Max, Solidworks
Budget: €30 - €250 EUR
04-May-2024
16:00 GMT
Urgent HTML Landing Page Design
Category: Graphic Design, HTML, Web Design
Budget: $10 - $30 USD
04-May-2024
15:59 GMT
Advanced WordPress Course Creation
Category: CSS, HTML, PHP, Web Design, WordPress
Budget: min $50 USD
04-May-2024
15:58 GMT
Software-Generated Random Text Data Entry
Category: Data Entry, Data Processing, Excel, Software Architecture
Budget: ₹750 - ₹1250 INR
04-May-2024
15:58 GMT
Skilled Android Developer for Bug Fix
Category: Android, IPhone, Mobile App Development, MySQL, PHP
Budget: ₹100 - ₹400 INR
04-May-2024
15:56 GMT
computer science project report creation
Category: Deep Learning, Machine Learning (ML), Python, Software Architecture, Statistics
Budget: ₹600 - ₹1500 INR
04-May-2024
15:56 GMT
Browse All Projects
Projects by Skills ...
Projects for 'android'
Projects for 'ajax'
Projects for 'asp'
Projects for 'aspnet'
Projects for 'cms'
Projects for 'cpp'
Projects for 'csharp'
Projects for 'css'
Projects for 'delphi'
Projects for 'design'
Projects for 'drupal'
Projects for 'excel'
Projects for 'facebook'
Projects for 'flash'
Projects for 'html'
Projects for 'java'
Projects for 'javascript'
Projects for 'joomla'
Projects for 'iphone'
Projects for 'mysql'
Projects for 'photoshop'
Projects for 'php'
Projects for 'python'
Projects for 'ruby'
Projects for 'seo'
Projects for 'sql'
Projects for 'sysadm'
Projects for 'translate'
Projects for 'typing'
Projects for 'twitter'
Projects for 'vbnet'
Projects for 'xml'
Projects for 'wordpress'
Projects for 'writing'
Read RSS feeds ... New!
RSS feed for 'android'
RSS feed for 'ajax'
RSS feed for 'asp'
RSS feed for 'aspnet'
RSS feed for 'cms'
RSS feed for 'cpp'
RSS feed for 'csharp'
RSS feed for 'css'
RSS feed for 'delphi'
RSS feed for 'design'
RSS feed for 'drupal'
RSS feed for 'excel'
RSS feed for 'facebook'
RSS feed for 'flash'
RSS feed for 'html'
RSS feed for 'java'
RSS feed for 'javascript'
RSS feed for 'joomla'
RSS feed for 'iphone'
RSS feed for 'mysql'
RSS feed for 'photoshop'
RSS feed for 'php'
RSS feed for 'python'
RSS feed for 'ruby'
RSS feed for 'seo'
RSS feed for 'sql'
RSS feed for 'sysadm'
RSS feed for 'translate'
RSS feed for 'typing'
RSS feed for 'twitter'
RSS feed for 'vbnet'
RSS feed for 'xml'
RSS feed for 'wordpress'
RSS feed for 'writing'
New!
Проекты на русском
(Projects in Russian)

Short URL:
1001fp.com
Mobile version:
m.1001freelanceprojects.com
Copyright © 2005-2022 1001 Freelance Projects