About Me

I love to dig into data and find the story it has to tell.My background as an author gave me a unique superpower: I can take complex, messy data and turn it into a clear, concise narrative that anyone can understand. After all, a great story is only effective when it reaches the right audience.My goal is to help companies make smarter decisions by creating reports and visualizations that are not just accurate, but also genuinely easy to read and act upon. I specialize in the full analytical workflow, leveraging SQL for robust data modeling and sometimes even Python for analysis, then delivering clear insights through Tableau, Power BI, and Excel.I'm currently focused on open-source contributions to strengthen my portfolio in real-world environments as I search for a full-time analytics position.


Skills

Excel | Tableau | Power BI | SQL | Python

  • Strategic Data Analysis - 3+ years

  • App Testing & Debugging - 5+ years

  • Business Process Improvement - 5+ years

  • Generative AI - 3+ years

  • Critical Thinking - 5+ years

  • Storytelling - 9+ years


Professional Certifications

CompTIA Data+

October 2025

Data Analyst Career Training

October 2025


Featured Projects

Market Pressure & Royalty Risk Analysis in Kindle Publishing

Excel | Pivot Tables | Charts

Strategic Decision-Making:
This analysis leverages descriptive statistics and data visualization on 3,800+ titles to show that independent Sci-Fi/Fantasy authors must price their catalogs 46.5% below major publishers to compete, directly demonstrating the economic constraint on per-unit revenue.


Reproducible Regression Workflow

End-to-End Data Analysis:
This project's workflow is fully documented on GitHub for version control, showcasing expertise in SQL for data cleaning (Outlier Detection) and Python for subsequent modeling and visualization.

SQL | Python | GitHub


Power BI | DAX | Tableau

Interactive Dashboard Design

Visualization Tool Versatility:
This project demonstrates expertise in both Power BI (using DAX logic) and Tableau, showcasing the ability to build and translate complex visualizations across leading Business Intelligence platforms.

Interactive Dashboard Design & Segmentation Analysis

Certain biometric measurements can predict others in iris flowers, but the predictive strength varies by species, necessitating segmentation.

Power BI Dashboard

Power BI Dashboard

This analysis first used Power BI to develop highly filterable and interactive dashboards. The project centered on categorical segmentation to visually verify linear relationships and demonstrate predictive capability.Using a foundational dataset of iris biometric measurements, the project focuses on one core question:How can categorical segmentation and dynamic visualization tools be used to identify strong positive correlations that enable the predictive estimation of one biometric feature based on another?Key Finding: Segmentation by species confirmed three distinct morphological groups, allowing the establishment of a strong positive correlation between key measurements. The analysis demonstrated the ability to leverage this linear relationship for reliable predictive estimation.

🛠️ Methodology & Technical Execution

  • Categorical Segmentation: The initial analysis confirmed that the three species (Iris setosa, I. versicolor, and I. virginica) are morphologically distinct, justifying the segmentation of all correlation and predictive analyses on a species-by-species basis.

  • Correlation & Predictive Modeling: Established a clear positive correlation between paired measurements (e.g., petal length and petal width), demonstrating the linear trend that allows one dimension to be used to predict the other.

  • Interactive Design: The dashboard was engineered using interactive slicers that allow users to filter data by specific measurement ranges, updating the trend lines and visualizations in real-time to reinforce the positive correlation principle.

  • Boundary Testing: Included intentional testing against the positive trend to encounter and visually confirm the presence of outliers or natural variations, reinforcing the critical role of exception handling in data analysis.

  • Tool Versatility: The entire analysis and dashboard design were successfully replicated in Tableau, proving platform versatility in reporting.

This video provides a full walk-through of the Interactive Dashboard Design & Segmentation Analysis, demonstrating key features like species-specific segmentation, filtering, and predictive correlations, all visualized with Tableau.

Tableau Dashboard Demo

Data Sources:

UCI Machine Learning. (2016). Iris Species. Kaggle.

Reproducible Linear Regression Analysis Workflow

Certain biometric measurements can predict body mass in penguins, but the predictive strength varies by species, necessitating segmentation.

Final Python Visualization of R^2^ Matrix

Final Python Visualization of R2 Matrix

This SQL and Python analysis details a fully documented, end-to-end analytical workflow focused on data integrity, outlier detection, and establishing a reliable linear predictive model for body mass.Using a curated dataset of biometric measurements, the project focuses on one core question:Can a single, easily measurable biometric feature reliably predict body mass across morphologically distinct species, and what is the effect of data outliers on model accuracy?Key Finding: The relationship strength varies significantly by species, confirming the necessity of a segmented approach. The Gentoo species exhibited the strongest predictive relationship (p=0.70), leading to a reliable linear model with an R2 value of 0.50, meaning flipper length explains 50% of the variation in their body mass.

🛠️ Methodology & Technical Execution

  • Data Cleaning: Leveraging SQL (specifically using Interquartile Range - IQR fences) for robust outlier detection and removal, ensuring the downstream model was trained on high-integrity data.

SQL Code for Identifying Outliers

SQL Code for Identifying Outliers

  • Modeling & Analysis: Utilized Python (Pandas) to transform the clean dataset and implement the linear regression model. Visualizations, including box plots, histograms, and heat maps, were generated using Matplotlib and Seaborn, successfully establishing a quantifiable positive correlation between key variables.

Python Code for Creating Histogram

Python Code for Creating Histogram

Resulting Histogram

Resulting Histogram

  • Visualization & Results: Used Python (Matplotlib/Seaborn) to visualize model fit, residuals, and the final established correlation.

Python for Visualizing Model Fit and Residuals

Python for Visualizing Model Fit and Residuals

  • Version Control: The complete process, including all cleaning scripts and modeling notebooks, is hosted and managed via GitHub to demonstrate best practices in documentation and collaboration.

Data Sources:

Pandey, P. (2020). Palmer Archipelago (Antarctica) penguin data. Kaggle.

Market Pressure & Royalty Risk Analysis in Kindle Publishing

Independent genre authors face an impossible challenge: determining competitive pricing while simultaneously protecting crucial royalty tiers.

Excel Dashboard with Charts and Pivot Tables

Excel Dashboard with Charts and Pivot Tables

This Excel analysis provides a concise comparison of pricing strategies and their direct financial implications for Independent (Indie) versus Big Publishers in the Science Fiction and Fantasy eBook market on Kindle.Using a new, curated dataset of 3,807 titles from Amazon.com, this project focuses on one core question:How does Indie Publisher pricing directly influence their royalty tier, and what percentage of titles fall into the lower (35%) royalty category due to competitive pricing?Key Finding: Self-published SF/F authors must price their entire catalog significantly lower to compete, with an average price 46.5% below Big Publishers. This widespread price suppression, rather than targeted use of the sub-$2.99 tier, is the dominant factor limiting their revenue potential per unit sold.

MetricIndieBig PublishersKey Insight
Average Price$5.40$10.08Indie prices are 46.5% lower than the average Big Publisher price.
Median Price$4.99$9.99 (Median of Medians)The price point at which 50% of Indie books are cheaper is half the Big Publisher median.
Minimum Price$0.00$0.00 to $2.99 (Range)Indie offers free books, a practice also seen in some Big Publisher rows, but the overall floor is often higher for Big Publishers.
Interquartile Range (IQR)$2.00$1.00 to $6.00 (Range)Indie prices exhibit a very tight distribution around the median, indicating less price variance across their titles. Big Publishers show much higher variability.
Q1 Price (25th Percentile)$3.99$8.28 (Average Q1)75% of Indie titles are priced above $ 3.99, whereas the average price for 75% of Big Publisher titles is above $ 8.28.

Table of Key Descriptive Statistics

🛠️ Methodology & Technical Execution

Data Preparation and Cleaning
The analysis of pricing and publisher trends began with a raw dataset with 31 unique genre/categories and over 130,000 rows of data originally scraped from Amazon.com in 2023. To create a clean, relevant sample in Excel, the following structured filtering and data preparation steps were executed:

Not All Ebooks Are the Same

  • Scope Filtering: The dataset was initially filtered to include only titles categorized under Science Fiction and Fantasy, removing all other genres to maintain market focus.

  • Publisher Exclusion: The scope was further refined by removing publishers whose content was not strictly text-format: Yen Press (comics/manga) and Games Workshop (LitRPG/tabletop guides).

  • Statistical Threshold: Publishers with fewer than 10 titles remaining in the filtered genre were removed to ensure statistical relevance and reliable aggregation for the descriptive measures.

  • Outlier Removal (Price): Observations with prices of $20.00 or higher were removed. This systematic truncation addressed high-priced outliers (up to $76.33) that were identified as complete series bundles, large box sets, or third-party reference texts, which would otherwise skew the central tendency metrics (e.g., average and median price) of individual titles.

This resultant dataset of 3807 rows, focused on core Science Fiction and Fantasy titles, was then used to compare the Independent Publisher (Amazon.com Services LLC) against the major publishing houses.This analysis was driven by a direct business need: after publishing 12 novels in the SFF genre, I suspected market conditions—rather than work quality—were limiting growth. The data confirmed this suspicion, quantifying the severity of price suppression and directly informing the strategic decision to exit the oversaturated publishing market.

If you're an independent genre author who appreciated this analysis, please feel free to connect with me on Linked.in.

Data Sources:

Saniczka, A. (2023). Amazon Kindle Books Dataset 2023 (130k+ Books). Kaggle.Kaylor, T. (2025). Sci-Fi/Fantasy Kindle Prices 2023 (3700+ Books). Kaggle.