We are a software company. For one of our projects we need to download information from a website containing articles about medical topics. The website contains cca. 10000 HTML pages of paged listing of articles in Czech language. The list contains titles of articles, each title having a link to the detail HTML page with the article text. We need someone to produce wget and other scripts and download the titles of all articles, parse the links from those titles, download the detailed pages of the articles and distill the text that is shown in the page. The titles as well as the detail pages mostly have the same structure so this allows for an automated work. But it is not so in 100% cases, there may be several types of structure so it may require some attention as to how to distill the correct information. The result of this work will be a set of static HTML files. You can view this structure under https://fomenot.com/z/dwld24/main.html I.e. the result will contain the contents of the article separated into paragraphs of normal text and captions (nothing else, no images or other texts). We only want the main text of the article that is visible on the screen for the user. No other text or html content. Another result will be the raw HTML output for each of the detail pages For accepting the output, we will do our check of the result. If we find errors, we will give examples of these errors and we will expect the vendor to fix all such errors in the result, not just those examples. If there are only a few errors we may not be able to find them and it is ok. But if we find any we will require correcting them. We expect that the raw HTML files will be 100% error free (for these we will not give examples, we just would demand fixing them). For the text-based results we will give examples before demanding to fix them.
An example of such a source page you can find here: https://www.idnes.cz/onadnes/zdravi/2 You can see a list of articles, each having a link leading to the detail and then a paging control that can load more articles from the next page. This is NOT the page we need to download but similar. Putting here the example only that you understand what is the task.
Let us know if you could do it and for what price. We will provide the real links to the selected candidate.
Creative Portfolio Website Development Category: CSS, Graphic Design, HTML, JavaScript, Photography, PHP, SEO, Web Design, Web Development Budget: ₹12500 - ₹37500 INR
22-Aug-2025 10:02 GMT
Certified Pen Tester for Vulnerability Assessment Category: Mobile App Testing, Network Security, Penetration Testing, Security, Security Systems, Software Testing, Testing / QA, Usability Testing, Web Security, Website Testing Budget: min €36 EUR
22-Aug-2025 10:02 GMT
Interactive Product Catalogue Development Category: Frontend Development, Graphic Design, HTML, PHP, UI / User Interface, Web Design, Web Development, WordPress Design Budget: ₹600 - ₹1500 INR
22-Aug-2025 10:00 GMT
AI Development Trainer Needed Category: AI Chatbot Development, AI Development, AI Mobile App Development, AI Model Development, Graphic Design, Illustration, Logo Design, Web Design Budget: ₹1500 - ₹12500 INR
22-Aug-2025 10:00 GMT
Creative Social Media Manager Category: Brand Management, Content Creation, Digital Marketing, Graphic Design, SEO, Social Media Management, Social Media Marketing, Video Editing Budget: ₹1500 - ₹12500 INR
22-Aug-2025 09:58 GMT
Excel Logical Formulas Expert Category: Data Analysis, Data Management, Data Processing, Excel, Excel Macros, Excel VBA, Statistics, Visual Basic Budget: $10 - $30 USD
22-Aug-2025 09:58 GMT
AI Google Workspace Workflow Automation Category: AI Development, API Integration, Automation, Data Analysis, Google App Engine, N8n, PHP, Python, Software Architecture, Zapier Budget: ₹750 - ₹1250 INR
22-Aug-2025 09:56 GMT
Multilingual WordPress Corporate Site Category: Corporate Identity, HTML, PHP, Web Development, Web Design, Website Management, WordPress, WordPress Design Budget: $30 - $50 USD
22-Aug-2025 09:56 GMT
Document Screenshot Specialist Category: Data Entry, Data Processing, Excel, PDF, Time Management, Word Budget: ₹100 - ₹400 INR
22-Aug-2025 09:55 GMT
Moodle LMS Integration with AI Chatbot Development Category: AI Chatbot Development, AI Model Development, Data Integration, Database Development, Database Programming, Moodle, Natural Language Processing, PHP Budget: ₹150000 - ₹250000 INR
22-Aug-2025 09:54 GMT
Laravel Email Config on GoDaddy Category: Backend Development, GoDaddy, HTML, Laravel, PHP, Web Development, Web Hosting, Web Design Budget: ₹600 - ₹1500 INR