Generating
0%
view from the airplane window
Window Back
Window Interior
Window Front Over

Hi, I am

Pengrui Zeng

Traveler and

Data Scientist

⬇Scrolling Down⬇

Data is Language

World is Canvas

IwasborninChina,andspent7yearsinAustraliacompletinghighschoolandmyundergraduatestudies,andlaterearnedaMasterdegreeinUnitedStatesOfAmerica.

Ihavecrossed5continentsandflownover250,000kilometers.Intherhythmofdeparturesandarrivals,Ihavecometoseethecharacterofcities,thewarmthofcultures,andgraduallyfoundmyownwayofrelatingtotheworld.Tome,travelismorethanmovement.Itisawayofunderstandinglife,connectingwithothers,andreshapingmyself.

Unsplash sampleUnsplash sampleUnsplash sampleUnsplash sample

My Journey

「成都•烟火人间」

「阿德莱德•海风旷野」

「布里斯班•黄金海岸」

「匹兹堡•桥城灯火」

「洛杉矶•爱乐之城」

Chengdu

2000.12~2016.08
Iwasbornheremyhometownwrappedinmistandspice.Morningteahouseshumsoftly,hotpotsteamrisesintohumidair,andtherhythmoflifemovesgently,neverrushed.Thisiswheremyfirstmemoriesformed,wherefamilystoriesandbambooshadowsshapedwhoIam.
2016.09~2018.12

Adelaide

Icameherealoneatfifteen,carryingmorecouragethancertainty.Inthequietstreets,wideskies,andoceanbreeze,Adelaidegentlyshapedmyteenageyears.ItwasherethatIcompletedhighschool,andthechapterofgrowingupfounditsrhythmsteady,simple,andunforgettable.

Brisbane

2019.02~2023.12
Underendlesssunshineandjacarandablooms,myjourneyinComputerScienceandDataSciencebegan.Lecturehalls,late-nightcoding,andriverreflectionsblendedintoaseasonofcuriosityandambition.Thiscitygavestructuretomydreamsandlogictomyimagination.
2023.09~2024.04

Pittsburgh

Acityofbridgesandsteel,whereintellectfeelstangible.Here,ataworld-leadingDataScienceinstitution,Isharpenedmycraftmodels,systems,algorithms,andideascollidingindisciplinedrigor.Inthecoldairbytherivers,Ilearnedhowtothinkdeeperandbuildstronger.
2024.12~2025.04

Los Angeles

Forsixmonthsaftergraduation,Ilivedamongpalmtreesandgoldensunsets.Lifefeltcinematichighwaysstretchingtowardpossibility,creativityineverycorner.Ilovedtheopennesshere,thesensethatreinventionisnotjustallowed,butexpected.
Chengdu sampleChengdu sampleChengdu sampleChengdu sample
Adelaide sampleAdelaide sampleAdelaide sampleAdelaide sample
Brisbane sampleBrisbane sampleBrisbane sampleBrisbane sampleBrisbane sample
Pittsburgh samplePittsburgh samplePittsburgh samplePittsburgh samplePittsburgh sample
LA sampleLA sampleLA sampleLA sampleLA sample
Cloud 1
Cloud 2
Cloud 3
Cloud 4
Cloud 5
Cloud 6
Cloud 7
Cloud 8
Cloud 9
Cloud 10
Cloud 11
ROOTED IN THE WORLD BELOW,
AIMING FOR THE STARS BEYOND.
Cloud 1
Cloud 2
Cloud 3
Cloud 4
Cloud 5
Cloud 6
Cloud 7
EDUCATION
CMU
Carnegie Mellon University

Sep 2023 - Dec 2024

Master of Information Systems, Data Science
GPA: 3.8 / 4.0

Focus: Data Science, Machine Learning, Artificial Intelligence.

UQ
University of Queensland

Feb 2019 - Dec 2022

Bachelor of Computer Science, Data Science
GPA: 3.88 / 4.0

Focus: Data Structures and Algorithms, Database Management, Statistics, Mathematics.

WORK EXPERIENCE

Intel logo

Intel

AI Data Scientist

May. 2025 - Present

Python
SQL
HTML
CSS
PyTorch
Spark
Data Mining
A/B Testing
Machine Learning
MLOps
Jenkins
GitHub Actions
Prometheus
Grafana
AWS
Microsoft Azure
Docker
Automation
Tableau / PowerBI

AI / ML Empowerment

Capacity Planning Reengineering: Leveraged multi-source data and AI/ML orchestration models to redesign capacity planning workflows, improving planning logic, responsiveness, and scalability across scenarios.

Strategic Validation: Designed and executed A/B tests to compare alternative planning strategies, measuring uplift via key KPIs (e.g., forecast accuracy, utilization variance, service-level risk).

Decision Support Systems: Built Power BI dashboards and internal websites to translate model outputs into executive-ready insights, clearer trade-offs, and more confident strategic decisions for executive leadership.


ML-Driven Forecasting

Multi-Model Framework: Developed a robust forecasting framework integrating SARIMAX, Prophet, and GBR to enable long-term global capacity and demand planning.

Advanced Segmentation: Applied Gaussian Mixture Models (GMM) for regional segmentation combined with advanced feature engineering.

Impact: Successfully deployed models through an end-to-end CI/CD ML pipeline with auto-retraining capabilities, boosting forecast accuracy by 20%.


Scalable Data Architecture

Distributed ETL Systems: Designed and built scalable ETL pipelines using Python, Azure Data Factory, and Databricks.

Big Data Processing: Leveraged Spark for TB-scale data processing and Airflow for automated workflow orchestration.

Digital Transformation: drove the standardization and digital transformation of Intel’s global supply chain data infrastructure.

MSA Safety logo

MSA Safety

Data Analyst Intern

May. 2024 - Dec. 2024

Python
SQL
Machine Learning
Spark
MLOps
ETL Pipelines
Microsoft Azure
SAP ERP
AWS Glue
Tableau / PowerBI

Global Procurement Analytics

Large-Scale Analysis: Utilized Python and SQL to analyze 10M+ rows of global procurement data to drive strategic sourcing decisions.

Supplier Segmentation: Applied K-Means clustering for supplier segmentation and conducted A/B testing to validate strategies.

Impact: Achieved a 10% cost reduction and a 15% reduction in procurement cycle time through data-driven optimization.


Machine Learning Optimization

Model Fine-Tuning: Fine-tuned Random Forest models on large-scale datasets to improve supplier selection accuracy.

Decision Support: Enhanced sourcing decision quality by integrating predictive analytics into the supplier evaluation process.


BI Architecture and ETL

Power BI Optimization: Restructured Power BI schemas and optimized complex DAX queries to improve dashboard performance.

Pipeline Engineering: Developed streamlined ETL pipelines for automated data flow and monitoring, significantly accelerating refresh speeds and operational efficiency.

LVMH logo

LVMH

Data Scientist Intern

Feb. 2023 - Jul. 2023

Python
Data Mining
SQL
Machine Learning
Time Series Forecasting (SARIMAX)
Model Optimization
AWS
User Segmentation
MLOps
Evaluation Metrics
A/B Test
Performance Marketing Analytics
Google Ads
TikTok Ads
Cloud-Based Deployment
Tableau / PowerBI

User Segmentation and Marketing Strategy

High-Precision Targeting: Analyzed 60M+ rows of user data, applying ML models (LR, GBDT, KNN, K-Means) and A/B testing to optimize multi-channel acquisition strategies.

Engagement Growth: Successfully boosted ad targeting precision and user engagement by 15%.

Campaign Analytics: Evaluated 9 major ad campaigns using multivariate and logistic regression, increasing operational efficiency by 34% while reducing costs by 15%.


ML-Powered Forecasting (Dior)

Inventory Optimization: Deployed advanced cloud-based SARIMAX models serving Dior’s operations across China, powering inventory management for 395 stores.

Accuracy Improvement: Achieved a 10.5% boost in forecasting accuracy during high-volume promotional periods.


Model Optimization and Performance

Algorithm Tuning: Enhanced model performance via cross-validation (AUC, KS metrics), grid search, and feature engineering.

Conversion Uplift: Resulted in a 28% increase in CTR (Click-Through Rate) and 8% in CVR (Conversion Rate) through iterative model refinement.

Signify logo

Signify

Data Expert Intern

Dec. 2021 - Sep. 2022

Python
SAP ERP
Data Mining
Automation
Optimization Algorithms
Excel
Power BI

Process Automation

Efficiency Boost: Created 30+ Python automation scripts to streamline manual supply chain processes.

Time Savings: Saved 60+ work hours monthly, freeing up resources for strategic tasks.

Digital Transformation: Advanced the department’s digital transformation through efficient data management practices and automated workflows.


Supply Chain Analytics

Anomaly Detection: Designed material consumption models and applied data mining on logistics and warehousing data to detect anomalies.

Inventory Optimization: Addressed inventory imbalances through predictive analysis, streamlining supply chain operations.

Impact: Raised customer satisfaction by 5% by ensuring smoother logistics and inventory availability.

PROJECT EXPERIENCE

CMU x Netflix: Large Scale Personalized Movie Recommendations

Aug 2024 – Dec 2024

API and Real-Time Processing

  • High-Throughput API: Implemented a real-time recommendation API using Kafka and Flask, capable of handling over 50M+ user interactions.
  • Stream Processing: Designed the system to process user events in real-time to update recommendations dynamically.

Scalable Database Architecture

  • Big Data Storage: Engineered a robust MySQL database storing 30M+ records (~112GB).
  • Performance Optimization: Applied multithreading for efficient batch processing and log parsing.
  • Database Tuning: Optimized storage performance through advanced partitioning and indexing techniques.

MLOps and CI/CD

  • Automated Pipelines: Integrated Jenkins and GitHub Actions to establish a robust CI/CD pipeline.
  • Monitoring: Deployed Prometheus and Grafana to monitor system performance and detect data drift in real-time.
Python
PyTorch
Spark
MySQL
Flask
Apache Kafka
Multithreading
System Architecture
Batch Processing
Log Parsing
Partitioning
Indexing
Jenkins
GitHub Actions
Prometheus
Grafana

CMU x TCS: Multimodal Emotion Understanding and Calibration with LLMs

Jan 2024 – May 2024

Transformer-Based NLP

  • Semantic Understanding: Built an advanced emotion classification system designed for deep semantic understanding and multi-class text classification.
  • Benchmarking: Benchmarked performance against traditional models like Word2Vec and TF-IDF, ensuring superior accuracy with modern architecture.

LLM Fine-Tuning

  • Optimization: Optimized DistilBERT with advanced training strategies to balance performance and computational efficiency.
  • Model Comparison: Conducted rigorous performance comparisons with BERT models to validate the efficacy of fine-tuning strategies.

MLOps Deployment

  • Inference API: Deployed a lightweight, high-performance inference API using Flask and RESTful services.
  • Scalability: Designed the deployment architecture to handle real-time requests efficiently.
Python
PyTorch
Spark
Transformer
BERT
DistilBERT
Word2Vec
TF-IDF
Fine-tuning
Multi-class Classification Metrics
Flask
RESTful API

CMU x General Electric: Recognition and Fault Detection System Project

Sep 2023 – Dec 2023

Deep Learning and Computer Vision

  • Real-Time Inference: Utilized YOLOv5 with multi-scale training techniques to achieve robust object detection.
  • Performance Tuning: Accelerated inference speed to 45ms/frame using TensorRT, enabling real-time fault detection capabilities.

System Integration and Edge AI

  • Streaming Architecture: Configured an optimized model pipeline integrated with Kafka-based streaming for continuous data processing.
  • Impact: Implemented edge validation mechanisms that successfully reduced downtime by 10%, enhancing operational reliability.
Python
PyTorch
Spark
YOLO
TensorRT
Kafka
Edge-side Validation
Inference
Automation
System Architecture

BUSINESS AND COLLABORATION

LinkedIn
GitHub
Notion
Instagram
Message
Groundfloor