I love to dig into data and find the story it has to tell.My background as an author gave me a unique superpower: I can take complex, messy data and turn it into a clear, concise narrative that anyone can understand. After all, a great story is only effective when it reaches the right audience.My goal is to help companies make smarter decisions by creating reports and visualizations that are not just accurate, but also genuinely easy to read and act upon. I specialize in the full analytical workflow, leveraging SQL for robust data modeling and sometimes even Python for analysis, then delivering clear insights through Tableau, Power BI, and Excel.I'm currently focused on open-source contributions to strengthen my portfolio in real-world environments as I search for a full-time analytics position.
Excel | Tableau | Power BI | SQL | Python
Strategic Data Analysis - 3+ years
App Testing & Debugging - 5+ years
Business Process Improvement - 5+ years
Generative AI - 3+ years
Critical Thinking - 5+ years
Storytelling - 9+ years
October 2025
Market Pressure & Royalty Risk Analysis in Kindle Publishing
Strategic Decision-Making:
This analysis leverages descriptive statistics and data visualization on 3,800+ titles to show that independent Sci-Fi/Fantasy authors must price their catalogs 46.5% below major publishers to compete, directly demonstrating the economic constraint on per-unit revenue.
Interactive Dashboard Design & Segmentation Analysis
Certain biometric measurements can predict others in iris flowers, but the predictive strength varies by species, necessitating segmentation.

Power BI Dashboard
This analysis first used Power BI to develop highly filterable and interactive dashboards. The project centered on categorical segmentation to visually verify linear relationships and demonstrate predictive capability.Using a foundational dataset of iris biometric measurements, the project focuses on one core question:How can categorical segmentation and dynamic visualization tools be used to identify strong positive correlations that enable the predictive estimation of one biometric feature based on another?Key Finding: Segmentation by species confirmed three distinct morphological groups, allowing the establishment of a strong positive correlation between key measurements. The analysis demonstrated the ability to leverage this linear relationship for reliable predictive estimation.
🛠️ Methodology & Technical Execution
Categorical Segmentation: The initial analysis confirmed that the three species (Iris setosa, I. versicolor, and I. virginica) are morphologically distinct, justifying the segmentation of all correlation and predictive analyses on a species-by-species basis.
Correlation & Predictive Modeling: Established a clear positive correlation between paired measurements (e.g., petal length and petal width), demonstrating the linear trend that allows one dimension to be used to predict the other.
Interactive Design: The dashboard was engineered using interactive slicers that allow users to filter data by specific measurement ranges, updating the trend lines and visualizations in real-time to reinforce the positive correlation principle.
Boundary Testing: Included intentional testing against the positive trend to encounter and visually confirm the presence of outliers or natural variations, reinforcing the critical role of exception handling in data analysis.
Tool Versatility: The entire analysis and dashboard design were successfully replicated in Tableau, proving platform versatility in reporting.
This video provides a full walk-through of the Interactive Dashboard Design & Segmentation Analysis, demonstrating key features like species-specific segmentation, filtering, and predictive correlations, all visualized with Tableau.
Tableau Dashboard Demo
Data Sources:
UCI Machine Learning. (2016). Iris Species. Kaggle.
Reproducible Linear Regression Analysis Workflow
Certain biometric measurements can predict body mass in penguins, but the predictive strength varies by species, necessitating segmentation.

Final Python Visualization of R2 Matrix
This SQL and Python analysis details a fully documented, end-to-end analytical workflow focused on data integrity, outlier detection, and establishing a reliable linear predictive model for body mass.Using a curated dataset of biometric measurements, the project focuses on one core question:Can a single, easily measurable biometric feature reliably predict body mass across morphologically distinct species, and what is the effect of data outliers on model accuracy?Key Finding: The relationship strength varies significantly by species, confirming the necessity of a segmented approach. The Gentoo species exhibited the strongest predictive relationship (p=0.70), leading to a reliable linear model with an R2 value of 0.50, meaning flipper length explains 50% of the variation in their body mass.
🛠️ Methodology & Technical Execution
Data Cleaning: Leveraging SQL (specifically using Interquartile Range - IQR fences) for robust outlier detection and removal, ensuring the downstream model was trained on high-integrity data.

SQL Code for Identifying Outliers
Modeling & Analysis: Utilized Python (Pandas) to transform the clean dataset and implement the linear regression model. Visualizations, including box plots, histograms, and heat maps, were generated using Matplotlib and Seaborn, successfully establishing a quantifiable positive correlation between key variables.

Python Code for Creating Histogram

Resulting Histogram
Visualization & Results: Used Python (Matplotlib/Seaborn) to visualize model fit, residuals, and the final established correlation.

Python for Visualizing Model Fit and Residuals
Version Control: The complete process, including all cleaning scripts and modeling notebooks, is hosted and managed via GitHub to demonstrate best practices in documentation and collaboration.
Data Sources:
Pandey, P. (2020). Palmer Archipelago (Antarctica) penguin data. Kaggle.
Market Pressure & Royalty Risk Analysis in Kindle Publishing
Independent genre authors face an impossible challenge: determining competitive pricing while simultaneously protecting crucial royalty tiers.

Excel Dashboard with Charts and Pivot Tables
This Excel analysis provides a concise comparison of pricing strategies and their direct financial implications for Independent (Indie) versus Big Publishers in the Science Fiction and Fantasy eBook market on Kindle.Using a new, curated dataset of 3,807 titles from Amazon.com, this project focuses on one core question:How does Indie Publisher pricing directly influence their royalty tier, and what percentage of titles fall into the lower (35%) royalty category due to competitive pricing?Key Finding: Self-published SF/F authors must price their entire catalog significantly lower to compete, with an average price 46.5% below Big Publishers. This widespread price suppression, rather than targeted use of the sub-$2.99 tier, is the dominant factor limiting their revenue potential per unit sold.
| Metric | Indie | Big Publishers | Key Insight |
|---|---|---|---|
| Average Price | $5.40 | $10.08 | Indie prices are 46.5% lower than the average Big Publisher price. |
| Median Price | $4.99 | $9.99 (Median of Medians) | The price point at which 50% of Indie books are cheaper is half the Big Publisher median. |
| Minimum Price | $0.00 | $0.00 to $2.99 (Range) | Indie offers free books, a practice also seen in some Big Publisher rows, but the overall floor is often higher for Big Publishers. |
| Interquartile Range (IQR) | $2.00 | $1.00 to $6.00 (Range) | Indie prices exhibit a very tight distribution around the median, indicating less price variance across their titles. Big Publishers show much higher variability. |
| Q1 Price (25th Percentile) | $3.99 | $8.28 (Average Q1) | 75% of Indie titles are priced above $ 3.99, whereas the average price for 75% of Big Publisher titles is above $ 8.28. |
Table of Key Descriptive Statistics
🛠️ Methodology & Technical Execution
Data Preparation and Cleaning
The analysis of pricing and publisher trends began with a raw dataset with 31 unique genre/categories and over 130,000 rows of data originally scraped from Amazon.com in 2023. To create a clean, relevant sample in Excel, the following structured filtering and data preparation steps were executed:
Not All Ebooks Are the Same
Scope Filtering: The dataset was initially filtered to include only titles categorized under Science Fiction and Fantasy, removing all other genres to maintain market focus.
Publisher Exclusion: The scope was further refined by removing publishers whose content was not strictly text-format: Yen Press (comics/manga) and Games Workshop (LitRPG/tabletop guides).
Statistical Threshold: Publishers with fewer than 10 titles remaining in the filtered genre were removed to ensure statistical relevance and reliable aggregation for the descriptive measures.
Outlier Removal (Price): Observations with prices of $20.00 or higher were removed. This systematic truncation addressed high-priced outliers (up to $76.33) that were identified as complete series bundles, large box sets, or third-party reference texts, which would otherwise skew the central tendency metrics (e.g., average and median price) of individual titles.
This resultant dataset of 3807 rows, focused on core Science Fiction and Fantasy titles, was then used to compare the Independent Publisher (Amazon.com Services LLC) against the major publishing houses.This analysis was driven by a direct business need: after publishing 12 novels in the SFF genre, I suspected market conditions—rather than work quality—were limiting growth. The data confirmed this suspicion, quantifying the severity of price suppression and directly informing the strategic decision to exit the oversaturated publishing market.
If you're an independent genre author who appreciated this analysis, please feel free to connect with me on Linked.in.
Data Sources:
Saniczka, A. (2023). Amazon Kindle Books Dataset 2023 (130k+ Books). Kaggle.Kaylor, T. (2025). Sci-Fi/Fantasy Kindle Prices 2023 (3700+ Books). Kaggle.