I'm currently seeking a data scientist for a project focused on matching two datasets stored in Google Data studio.
Project Overview:
Datasets: We have two datasets - one with 28 million rows. Task: Design and implement an effective matching solution between the two datasets. Aspects: Data preprocessing, feature engineering, and composite key creation for matching. Daily Updates: This task needs to be performed daily. Platform: Data is stored on Google Data Studio.
Key Responsibilities:
· Ensure data consistency and cleanliness. · Develop strategies for effective feature creation. · Create a robust composite key for matching. · Incremental Matching: · Implement solutions to handle daily dataset updates. · Develop strategies to optimize matching performance.
I'm particularly interested in speaking with data scientists who have successfully completed similar projects in the past. If you have relevant experience, I'd love to hear about it and discuss your approach. Please feel free to share an example and explain the methods you used in your opening response.