DataJobs.io
← Back to all jobs

Job Description

The Senior Data Analyst role in New York City offers a hybrid work arrangement and centers on SQL, PySpark, Python with Pandas, and machine learning initiatives. This position is open to local candidates, with a two-hour onsite interview in NYC. Compensation is USD 65 per hour.

Responsibilities

  • Perform SQL-based data extraction and analysis
  • Process data with PySpark to support large-scale workflows
  • Carry out analyses using Python and Pandas
  • Derive actionable insights from sizable datasets
  • Develop and evaluate machine learning models
  • Deliver analytics and reporting to inform decisions

Requirements

  • Strong SQL query writing and optimization
  • Proficiency in PySpark
  • Python programming skills
  • Experience with Pandas for data manipulation
  • Background in machine learning applications
  • Ability to write, execute, and troubleshoot SQL queries
  • Data extraction and integration capabilities
  • Experience creating joins, aggregations, and filters
  • Basic to intermediate data transformations
  • Building and maintaining data pipelines
  • Large-scale data processing and optimization
  • Data cleansing and preparation
  • Data transformation in distributed environments
  • Exploratory data analysis and data manipulation with Pandas
  • Exposure to classification, regression, and clustering models
  • Model evaluation and performance measurement

Technologies

  • SQL
  • PySpark
  • Python
  • Pandas
  • PowerBI
  • Databricks
  • Google Cloud Platform (GCP)

Benefits

  • 401(k)
  • Dental insurance
  • Health insurance
  • Vision insurance

Interview Details

  • In-person interview only, no video interviews
  • Candidate must be local to attend a two-hour onsite session in NYC
  • Live SQL and PySpark coding exercises will be conducted
  • Candidates should have thoroughly validated hands-on coding skills prior to submission

Team Structure

  • 10 Data Engineers
  • 6 Data Scientists

Certifications

  • Highly Preferred: Databricks Data Engineer Certification
  • Acceptable Post-Hire: Databricks Certification
  • Acceptable Post-Hire: Google Cloud (GCP) Certification

Priority Technical Skill: PySpark

  • Building and maintaining data pipelines
  • Large-scale data processing
  • Data transformation and optimization
  • Distributed data environments

Python and Pandas

  • Data cleansing and preparation
  • Data transformation
  • Exploratory data analysis
  • Data manipulation with Pandas

Machine Learning Focus

  • Classification models
  • Regression models
  • Clustering techniques
  • Model evaluation and performance measurement

Similar Jobs

Get Job Alerts

New jobs delivered to your inbox.