De Canaria
About Us
About the job
Company
We are a technology product company, specializing in higher education and employment analytics. We use state of art natural language processing (NLP) techniques, machine learning and AI algorithms to process large-scale data in higher education, employment markets and others to help our clients decide and prioritize programs, markets and investment decisions.
Benefits
- High-end laptop
- Annual Performance Bonus (up to 2x salary)
- Fully remote (with option to work from on-site office in Istanbul)
- Work with Global teams (US, Turkey, India)
- Founders and employees with FAANG experience (Google, Facebook, Amazon, eBay, Microsoft), and from top institutions (Stanford, Columbia, Caltech, Harvard, MIT)
- Access to workstation with 256 GB memory, 48 GB GPU RAM, 32 Cores
- Work on text data that is 100x bigger than Wikipedia
- Additional private insurance and monthly food allowance
Job Description
We are looking for an Data Scientist to contribute to our cloud based web scraping and data processing platform for the employment market. We will build thousands of web scrapers, data pipelines, machine-learning applications and SaaS products for our clients in higher education, talent management, recruitment and investment firms.
Data Scientist will
- Analyze millions of data points with unstructured text to find insights
- Optimize the geolocation, deduplication and data pulling frequency strategies for the thousands of web scrapers
- Train state-of-art NLP models to generate document embeddings for similarity matching (via ANN algos such as HNSW)
- Implement best MLOps practices while building machine learning and data processing pipelines, with high coverage for unit and integration tests.
- Collaborate with a cross functional team of consultants, analysts, software engineers, data engineers, and others to help build new products and features
- Write clean, portable, and well-documented code
- Use Docker, Kubernetes, 3rd party Cloud technologies (Google Cloud Platform) for ML workloads (processing, training and inference) whenever applicable
- Communicate findings, analysis and technical concepts to varied executive audiences in a simple manner
- Build and maintain large-scale, unstructured (NoSQL) database
- Build pipelines to clean and maintain data for machine-learning applications
Their work will need to be complemented with unit and integration tests, alerting and monitoring dashboards as applicable.
Must-have
- Ability to write clean, well-documented code
- Scripting language experience: Python, Ruby etc.
- Knowledge of SQL, No-SQL and Linux operating systems
- Familiarity with building data pipelines with unit and integration tests
- Knowledge of data modeling and building data infrastructure
- Bachelor's or Master's degree in Computer Science or Electrical & Electronics Engineering
Nice to Have
- Experience using any public Cloud technologies (GCP, AWS, Azure) for data processing, storage and modeling pipelines (e.g Dataflow, BigQuery, SageMaker etc)
- Familiar with business intelligence tools like Looker, Tableau, Qlik, PowerBI
- Familiarity with batch data processing technologies: Spark, Scala, Hive etc.
Hiring Process
- Take home project (3 days to complete)
- Chat with our Head of People (45 mins)
- Interview with Sr Engineer - project & technical interview (45 mins)
- Interview with Engineer - technical interview (45 mins)