I'm a lawyer seeking an experienced developer to create an AI agent that can automatically collect, process, and integrate court decisions from Croatian legal websites into our database. The agent should be able to:
1. Scrape data from specific Croatian legal websites, including https://e-oglasna.pravosudje.hr/ 2. Navigate through search interfaces and handle dynamic content 3. Download and process PDF documents containing court decisions 4. Extract relevant information from these documents using NLP techniques 5. Categorize and index the decisions based on predefined legal areas and keywords 6. Integrate the processed information into our existing legal database
Required Skills and Experience:
- Proficient in Python, with expertise in web scraping libraries (e.g., Scrapy, Selenium) - Experience with PDF processing libraries (e.g., PyPDF2, pdfminer) - Strong background in Natural Language Processing (NLP) using libraries like NLTK or spaCy - Familiarity with database management and indexing (e.g., SQL, Elasticsearch) - Experience in developing AI/ML models for text classification and information extraction - Knowledge of web technologies and ability to handle dynamic content and CAPTCHAs - Understanding of data privacy and security best practices - Ability to work with Croatian language text (knowledge of Croatian is a plus but not mandatory) - Experience with legal documents or similar text-heavy domains is advantageous
Deliverables:
1. A fully functional AI agent meeting the above requirements 2. Comprehensive documentation and user guide 3. Source code with clear comments 4. A report detailing the methodology, challenges, and potential improvements
Please provide examples of similar projects you've worked on, especially those involving web scraping, PDF processing, or legal document analysis. Include your estimated timeline and budget for this project.
Note: The successful candidate must be willing to sign a non-disclosure agreement due to the sensitive nature of legal data.