Senior Data Analyst -SQL/Machine Learning /AI/PowerBI
Job Description
The Senior Data Analyst role in New York City offers a hybrid work arrangement and centers on SQL, PySpark, Python with Pandas, and machine learning initiatives. This position is open to local candidates, with a two-hour onsite interview in NYC. Compensation is USD 65 per hour.
Responsibilities
- Perform SQL-based data extraction and analysis
- Process data with PySpark to support large-scale workflows
- Carry out analyses using Python and Pandas
- Derive actionable insights from sizable datasets
- Develop and evaluate machine learning models
- Deliver analytics and reporting to inform decisions
Requirements
- Strong SQL query writing and optimization
- Proficiency in PySpark
- Python programming skills
- Experience with Pandas for data manipulation
- Background in machine learning applications
- Ability to write, execute, and troubleshoot SQL queries
- Data extraction and integration capabilities
- Experience creating joins, aggregations, and filters
- Basic to intermediate data transformations
- Building and maintaining data pipelines
- Large-scale data processing and optimization
- Data cleansing and preparation
- Data transformation in distributed environments
- Exploratory data analysis and data manipulation with Pandas
- Exposure to classification, regression, and clustering models
- Model evaluation and performance measurement
Technologies
- SQL
- PySpark
- Python
- Pandas
- PowerBI
- Databricks
- Google Cloud Platform (GCP)
Benefits
- 401(k)
- Dental insurance
- Health insurance
- Vision insurance
Interview Details
- In-person interview only, no video interviews
- Candidate must be local to attend a two-hour onsite session in NYC
- Live SQL and PySpark coding exercises will be conducted
- Candidates should have thoroughly validated hands-on coding skills prior to submission
Team Structure
- 10 Data Engineers
- 6 Data Scientists
Certifications
- Highly Preferred: Databricks Data Engineer Certification
- Acceptable Post-Hire: Databricks Certification
- Acceptable Post-Hire: Google Cloud (GCP) Certification
Priority Technical Skill: PySpark
- Building and maintaining data pipelines
- Large-scale data processing
- Data transformation and optimization
- Distributed data environments
Python and Pandas
- Data cleansing and preparation
- Data transformation
- Exploratory data analysis
- Data manipulation with Pandas
Machine Learning Focus
- Classification models
- Regression models
- Clustering techniques
- Model evaluation and performance measurement
Similar Jobs
N