1001 Freelance Projects
Latest Projects from Freelance Marketplaces
Today is: 17-May-2024 07:11 GMT
View Project
View this project in detail (Note: you will be redirected to external marketplace)
Project title: Local Council Chatbot utilising Llama2 and dataset of PDF docs.
Posted by: External project from PeoplePerHour
Started: 07-Mar-2024 15:33 GMT
Description: Full stack developer with relevant experience in AWS services and LLM deployment.
Description
Develop an MVP that provides a chat interface allowing users to query a dataset of local council documents, which will variously include minutes and policy documents. A dataset that contains all information relating to the purpose, policies, news, information, and decision making by that council.
The dataset would contain approximately 100 PDF documents, and the chatbot would return meaningful and coherent answers to user prompts, while providing reference links to documents that information in the response is taken from.
The client acknowledges the current limitations of LLMs in returning responses from queries across multiple documents, especially given current token limits and processing cost restrictions. A developer is sought that can leverage techniques to embed metadata in the text, allowing techniques such as RAG to extract snippets of data from multiple documents relating to the query and collate them into a response to the user, while adhering to token limits.
Objective
Develop an automated semantic text analysis pipeline that processes and analyses textual data extracted from documents using Llama2. This pipeline enriches text with metadata for deeper insights and enables semantic search capabilities through a user-friendly interface. This stage of the project is for a MVP system, leveraging AWS services such as Textract for text extraction, a text categorising stage with a simple to use GUI, all-mpnet-base-v2 for embedding, and Postgres with a vector extension.
This job posting is for the MPV stage only, but we must be mindful of the stage two development and facilitate rapid and straightforward scalability in any stage one MPV processes.
System Overview
The solution encompasses AWS services for storage and processing, a custom interface for metadata enrichment, all-mpnet-base-v2 for generating text embeddings, Postgres and a vector extension for efficient storage and retrieval of vectors, and a custom-built web interface for user interaction. RAG will be implemented with a broad a context as possible to the model across a large document set.
Phase 1: MVP Stage
1. Document Storage and Processing Trigger
Tool: Amazon S3.
Process: Upload documents (PDFs initially) to designated S3 buckets, documents will be renamed in accordance with a set naming convention and details of the document entered into the database. This triggers the subsequent text extraction process. For test purposes the uploads will be made manually, and at later stages a web scraper will be added that automatically places PDF documents into relevant S3 buckets.
2. Text Extraction
- Tool: AWS Textract.
- Process: Text is extracted from uploaded PDF documents and temporarily stored in A3 buckets to facilitate further processing.
3. Text Enrichment
Developer to advise on best method of adding labels / categories to the text, via an easy to use interface. Labels to be added at a granular level to allow the return of text snippets from within the chunks of data, but with relevant metadata. The purpose of this is to provide context to the LLM in formulating responses from a broad range of documents without exceeding the token limit.
4. Text Vectorization
• Embedding tool: all-mpnet-base-v2
• LLM: Amazon SageMaker (using LLaMA 2).
Process: The text is processed with LLaMA 2 to generate vector embeddings, capturing semantic information for advanced analysis and search functionalities.
5. Vector Storage
Tool: Postgres with a vector extension
Process: Text vectors are stored in the database, allowing for efficient management and retrieval of vectorized data for semantic searches.
6. Front-end Web Application and Search Functionality
Front-end Technology:
• React.js.
Key Features:
• Semantic search input and results display.
• email input field for collecting contact information for marketing purposes, forwarding to the client's email address.
• Homepage containing descriptive marketing text.
3 pages total: home page, interaction page, contact page, plus a pop up with GDPR info. Graphics provided as template guidance.
Back-end Technology:
• Python with FastAPI.
7. Fine Tuning
Allow for fine tuning based on a series of questions and responses to be provided by the client, until such point that coherent responses to queries are achieved.
Phase 2: Full Automation and Scaling
Beyond the scope of this job.
Notes:
The developer is to provide guidance and feedback on the capabilities of the technologies and is free to provide their own guidance and suggestions. However, the functionality of the system in providing coherent responses based on text snippets drawn from a large dataset is both the challenge and the absolute requirement.
Please only bid with your full and final price. Placeholders will not be accepted.
Completion with approximately two weeks.

Please respond by explaining how you would handle the text enrichment?
Project ID: 3374653
Project category:
Project budget:
View this project in detail (Note: you will be redirected to external marketplace)
Last Projects / Browse Projects
  Project Started
Mid-Level Creative/Design Contractors Needed
Category: Advertising, Bulk Marketing, Facebook Marketing, Internet Marketing, Telemarketing
Budget: $750 - $1500 AUD
17-May-2024
04:03 GMT
Graphic Design Collaboration Platform Development
Category: HTML, JavaScript, MySQL, PHP, Web Design
Budget: ₹12500 - ₹37500 INR
17-May-2024
04:03 GMT
PDF to Word Typing
Category: Data Entry, Excel, PDF, Powerpoint, Word
Budget: ₹750 - ₹1250 INR
17-May-2024
04:02 GMT
Asynchronous Multiplayer Unity Game Development
Category: Game Design, Game Development, IPhone, Mobile App Development, Unity 3D
Budget: $2 - $8 USD
17-May-2024
04:01 GMT
WordPress Frontend Design for Ecommerce
Category: Elementor, Graphic Design, HTML, Web Design, WordPress
Budget: ₹1000 - ₹2000 INR
17-May-2024
03:59 GMT
Car Repair Workshop Ticket System
Category: Graphic Design, HTML, MySQL, PHP, Web Design
Budget: $10 - $30 USD
17-May-2024
03:58 GMT
Satirical Political Cartoons for Publication
Category: Caricature & Cartoons, Graphic Design, Illustration, Visual Arts
Budget: $750 - $1500 USD
17-May-2024
03:58 GMT
LikeChess - Dashboard
Category: HTML, JavaScript, MySQL, PHP, Web Design
Budget: ₹1500 - ₹12500 INR
17-May-2024
03:57 GMT
PDF Data Extraction into Excel
Category: Data Entry, Data Processing, Excel, PDF, Visual Basic
Budget: ₹12500 - ₹37500 INR
17-May-2024
03:57 GMT
Advanced PHP CRM Chatbot Development
Category: ChatGPT, Magento 2, PHP, Software Architecture, Web Design
Budget: ₹100 - ₹400 INR
17-May-2024
03:56 GMT
Finishing an existing Magento 2 TakePayments Integration Enhancements -- 4
Category: ECommerce, HTML, Magento, PHP, Web Design
Budget: $8 - $15 USD
17-May-2024
03:55 GMT
Oil & Gas UI Developer Needed
Category: Graphic Design, HTML, JavaScript, PHP, Web Design
Budget: ₹12500 - ₹37500 INR
17-May-2024
03:53 GMT
High-Accuracy Photo-to-Text AI Training
Category: Android, IPhone, Mobile App Development
Budget: $750 - $1500 USD
17-May-2024
03:53 GMT
Promotional Live-Action Video for Young Adults
Category: Animation, Video Editing, Video Production, Video Services, Videography
Budget: $10 - $30 USD
17-May-2024
03:50 GMT
CommonCrawl.org Search Engine Development
Category: Java, JavaScript, MySQL, PHP, Software Architecture
Budget: $250 - $750 USD
17-May-2024
03:50 GMT
Browse All Projects
Projects by Skills ...
Projects for 'android'
Projects for 'ajax'
Projects for 'asp'
Projects for 'aspnet'
Projects for 'cms'
Projects for 'cpp'
Projects for 'csharp'
Projects for 'css'
Projects for 'delphi'
Projects for 'design'
Projects for 'drupal'
Projects for 'excel'
Projects for 'facebook'
Projects for 'flash'
Projects for 'html'
Projects for 'java'
Projects for 'javascript'
Projects for 'joomla'
Projects for 'iphone'
Projects for 'mysql'
Projects for 'photoshop'
Projects for 'php'
Projects for 'python'
Projects for 'ruby'
Projects for 'seo'
Projects for 'sql'
Projects for 'sysadm'
Projects for 'translate'
Projects for 'typing'
Projects for 'twitter'
Projects for 'vbnet'
Projects for 'xml'
Projects for 'wordpress'
Projects for 'writing'
Read RSS feeds ... New!
RSS feed for 'android'
RSS feed for 'ajax'
RSS feed for 'asp'
RSS feed for 'aspnet'
RSS feed for 'cms'
RSS feed for 'cpp'
RSS feed for 'csharp'
RSS feed for 'css'
RSS feed for 'delphi'
RSS feed for 'design'
RSS feed for 'drupal'
RSS feed for 'excel'
RSS feed for 'facebook'
RSS feed for 'flash'
RSS feed for 'html'
RSS feed for 'java'
RSS feed for 'javascript'
RSS feed for 'joomla'
RSS feed for 'iphone'
RSS feed for 'mysql'
RSS feed for 'photoshop'
RSS feed for 'php'
RSS feed for 'python'
RSS feed for 'ruby'
RSS feed for 'seo'
RSS feed for 'sql'
RSS feed for 'sysadm'
RSS feed for 'translate'
RSS feed for 'typing'
RSS feed for 'twitter'
RSS feed for 'vbnet'
RSS feed for 'xml'
RSS feed for 'wordpress'
RSS feed for 'writing'
New!
Проекты на русском
(Projects in Russian)

Short URL:
1001fp.com
Mobile version:
m.1001freelanceprojects.com
Copyright © 2005-2022 1001 Freelance Projects