Cleanlab Company logo on Dataaxy

Senior Data Scientist

San Francisco Bay Area
Mid-Senior level

Do you like what you see here?

Are you an expert in the Python data science ecosystem (eg. Jupyter, pandas, sklearn, huggingface, langchain, cleanlab) and working with massive databases (eg. SQL, Spark, Databricks)?

Do you love developing high-quality software (not just doing modeling) in cloud environments (eg. AWS)?

Are you an excellent communicator? (excited to talk to customers, strong writer)

If your answer to all the above is a resounding Yes and you have 5+ years experience doing these things in industry, then we'd love to have you join our startup building the future of Data-Centric AI!

At Cleanlab you’ll get to

  • Help companies across all industries solve their Data/AI/Analytics challenges using novel Data-Centric AI technology invented by our team.
  • Help develop new capabilities for our Data-Centric AI platform that can automatically diagnose/fix issues in most image, text, structured/tabular datasets. Our software uses AI to increase the value of a company’s existing data (i.e. their core asset).
  • Learn about interesting data science problems that customers are grappling with, and prototype effective solutions. See the direct impact of your work to help close 6-7 figure sales deals.
  • Publish innovative blog posts and open-source code.

What we’re looking for

  • An expert in the Python data science ecosystem (eg. Jupyter, pandas, sklearn, huggingface, langchain, cleanlab) and working with massive databases and cloud data stores (eg. SQL-optimization, Spark, Databricks, AWS, GCP, Azure)
  • An excellent communicator with strong writing skills who is excited to meet with and support enterprise customers
  • The best candidates will be extremely strong in: coding, data science, the modern Data/AI stack, and will simultaneously love working with people and supporting them.


  • Work with our customers and help them solve their data science problems using Cleanlab technology (across all data types / industries / ML problems).
  • Prototype new capabilities for our Data-Centric AI platform and get early feedback from customer.
  • Publish blogposts, tutorials, and other public content to help all other teams solve similar problems as the ones you are working on.

Note: this role is not a traditional Data Science position, you will also work in Customer Success as a Solutions Architect helping our customers develop solutions to their Data/AI challenges. This job will be approximately: 20% sales demos, 20% customer success (answering customer questions), 30% data scientist (analyzing data and sharing results), 30% solutions architect (building prototype solutions, fixing problems in customer solutions).


This is a senior role! Candidates must have at least 5+ years industry work experience as a Data Scientist or Solutions Architect.

  • Python (pandas, scikit-learn, numpy, Jupyter)
  • Databases and other data stores (e.g. Databricks, Snowflake, Redshift)
  • AWS
  • Git

Bonus Qualifications

We will favor candidates that are:

  • Experienced working in a Solutions Architect style role dealing with datasets from other (big) companies and solving their data science challenges.
  • Knowledgeable about enterprise sales cycle and effective customer success.
  • Full-stack data scientist with ability to develop high-quality software (not just doing modeling) and properly productionize it (eg. Docker, CI/CD) in cloud environments (eg. AWS, Sagemaker).
  • Have a PhD in data-related discipline or authored well-read blogs/articles or popular Github repos.

Bonus skills:

  • Sagemaker, MLflow
  • Databricks, Snowflake, Redshift
  • ELT/ETL tools
  • Cleanlab or other data-centric AI tools
  • LangChain

Learn more about the role and benefits here:

Key informations

Posted 4 months ago

Don’t miss out on new
Data & AI Jobs

Get curated job alerts weekly.

Other jobs at Cleanlab

Cleanlab does not currently have any open job positions in Data & Ai.
© 2023 | All Rights Reserved | Built with 🤍 in MontrealAll our data is gathered from publicly available sources or contributed by users