References

Abela, Andrew. 2020. “Choosing a Good Chart.” https://extremepresentation.typepad.com/blog/2006/09/choosing_a_good.html.
Adhikari, Ani, John DeNero, and David Wagner. 2022. Computational and Inferential Thinking: The Foundations of Data Science. 2nd Ed. https://inferentialthinking.com/chapters/intro.html.
Akidau, Tyler, Slava Chernyak, and Reuven Lax. 2018. Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing. O’Reilly Media.
Alley, Michael. 2013. The Craft of Scientific Presentations, 2nd Ed. Springer Verlag, New York.
Angwin, Julia, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. “Machine Bias—Risk Assessment in Criminal Sentencing.” ProPublica, May. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.
Appel, Gil, Juliana Neelbauer, and David A. Schweidel. 2023. “Generative AI Has an Intellectual Property Problem.” Harvard Business Review. https://hbr.org/2023/04/generative-ai-has-an-intellectual-property-problem.
Awati, Kailash, and Simon Buckingham Shum. 2015. “Big Data Metaphors We Live By.” Medium. https://towardsdatascience.com/big-data-metaphors-we-live-by-98d3fa44ebf8.
Bashir, Noman, Priya Donti, James Cuff, Sydney Sroka, Marija Ilic, Vivienne Sze, Christina Delimitrou, and Elsa Olivetti. 2024. “The Climate and Sustainability Implications of Generative AI.” An MIT Exploration of Generative AI.
Bass, Len, Paul Clements, and Rick Kazman. 2012. Software Architecture in Practice. Addison-Wesley Professional.
Behar, Roberto, Pere Grima, and Lluís Marco-Almagro. 2013. “Twenty-Five Analogies for Explaining Statistical Concepts.” The American Statistician 67: 44–48.
Benson H., Dusek J. A., Sherwood J. B., P. Lam, C. F. Bethea, W. Carpenter, S. Levitsky, et al. 2006. “Study of the Therapeutic Effects of Intercessory Prayer (STEP) in Cardiac Bypass Patients: A Multicenter Randomized Trial of Uncertainty and Certainty of Receiving Intercessory Prayer.” American Heart Journal 151 (4): 934–42.
Berreby, David. 2024. “As Use of A.I. Soars, so Does the Energy and Water It Requires.” https://e360.yale.edu/features/artificial-intelligence-climate-energy-emissions.
Borne, Kirk. 2021. “Data Profiling–Having That First Data with Your Data.” Medium. https://medium.com/codex/data-profiling-having-that-first-date-with-your-data-2e05de50fca7.
Box, G. E. P., and D. R. Cox. 1964. “An Analysis of Transformations.” Journal of the Royal Statistical Society. Series B (Methodological) 26 (2): 211–52. http://www.jstor.org/stable/2984418.
Box, George E. P. 1976. “Science and Statistics.” Journal of the American Statistical Association 71 (356): 791–99.
Box, George E. P., and Norman R. Draper. 1987. Empirical Model-Building and Response Surfaces. John Wiley & Sons, New York.
Breiman, Leo. 1996. “Bagging Predictors.” Machine Learning 24: 123–40.
———. 2001a. “Random Forests.” Machine Learning 45: 5–32.
———. 2001b. “Statistical Modeling: The Two Cultures.” Statistical Science 16 (3): 199–231.
Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone. 1984. Classification and Regression Trees. Wadsworth, Pacific Grove, CA.
Burgess, Matt. n.d. “Strava’s Heatmap Was a ‘Clear Risk’ to Security, UK Military Warned.” https://www.wired.co.uk/article/strava-heat-maps-military-app-uk-warning-security.
Burkov, Andriy. 2020. Machine Learning Engineering. True Positive Inc.
Buschmann, Frank, Regine Meunier, Hans Rohnert, Peter Sommerlad, and Michael Stal. 1996. Pattern-Oriented Software Architecture: A System of Patterns. John wiley & sons.
Chawla, Nitesh V., Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. “SMOTE: Synthetic Minority over-Sampling Technique.” J. Artif. Int. Res. 16 (1): 321–57.
Chen, Cheng-Tao, and Jiang Zhang. 2014. “Lambda Architecture for Cost-Effective Batch and Speed Big Data Processing.” Proceedings of the 2014 IEEE International Congress on Big Data, 133–40.
Church, Karen. 2023. “The Most Underrated Skill in Data Science: Communication.” Medium. https://medium.com/intercom-rad/the-most-underrated-skill-in-data-science-communication-7ed2fab82801.
Cleveland, William S. 2001. “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics.” International Statistical Review / Revue Internationale de Statistique 69 (1): 21–26. http://www.jstor.org/stable/1403527.
Cox, M., and D. Ellsworth. 1997. “Application-Controlled Demand Paging for Out-of-Core Visualization.” In Proceedings. Visualization ’97 (Cat. No. 97CB36155), 235–44. https://doi.org/10.1109/VISUAL.1997.663888.
Cressie, N. 2023. “Robodebt Not Only Broke the Law of the Land – It Also Broke Laws of Mathematics.” The Conversation. https://theconversation.com/robodebt-not-only-broke-the-laws-of-the-land-it-also-broke-laws-of-mathematics-201299.
Cunningham, Adam. 2024. “Probability Playground.” Amstat News, 26–28.
Davenport, Thomas H., and D. J. Patil. 2012. “Data Scientist: The Sexiest Job of the 21st Century?” Harvard Business Review. https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century.
———. 2022. “Is Data Scientist Still the Sexiest Job of the 21st Century?” Harvard Business Review. https://hbr.org/2022/07/is-data-scientist-still-the-sexiest-job-of-the-21st-century.
Debortoli, Stefan, Oliver Müller, and Jan vom Brocke. 2014. “Comparing Business Intelligence and Big Data Skills.” Business & Information Systems Engineering 6 (5): 289–300.
Derr, Janice. 2000. Statistical Consulting: A Guide to Effective Communication. Duxbury Press, Brooks/Cole, Pacific Grove, CA.
Erl, Thomas. 2005. Service-Oriented Architecture: Concepts, Technology, and Design. Pearson Education India.
Feinberg, William E. 1971. “Teaching the Type i and Type II Errors: The Judicial Process.” The American Statistician 25 (3): 30–32. http://www.jstor.org/stable/2683322.
Fielding, Roy Thomas. 2000. “Architectural Styles and the Design of Network-Based Software Architectures.” PhD thesis, University of California, Irvine. https://ics.uci.edu/~fielding/pubs/dissertation/top.htm.
Fielding, Roy, and Julian Reschke. 2014a. “Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing.” RFC 7230. Internet Engineering Task Force (IETF). https://doi.org/10.17487/RFC7230.
———. 2014b. “Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content.” RFC 7231. Internet Engineering Task Force (IETF). https://doi.org/10.17487/RFC7231.
Fischhoff, Baruch. 1975. “Hindsight Is Not Equal to Foresight: The Effect of Outcome Knowledge on Judgment Under Uncertainty.” Journal of Experimental Psychology: Human Perception and Performance 1: 288--299.
Fowler, Martin. 2002. Patterns of Enterprise Application Architecture. Addison-Wesley Professional.
Fowler, Martin, and James Lewis. 2014. “Microservices.” ThoughtWorks. https://martinfowler.com/articles/microservices.html.
Gardner, Howard. 1983. Frames of Mind: The Theory of Multiple Intelligences. Basic Books, New York.
Gelman, A., and A. Unwin. 2013. “Infovis and Statistical Graphics: Different Goals, Different Looks.” Journal of Computational and Graphical Statistics 22: 2–28. https://www.tandfonline.com/doi/full/10.1080/10618600.2012.761137.
Godsey, Brian. 2017. Think Like a Data Scientist. Manning Publications. https://www.oreilly.com/library/view/think-like-a/9781633430273/.
Graybill, Franklin A. 1976. Theory and Application of the Linear Model. Duxbury Press, North Scituate, Massachusetts.
———. 1983. Matrices with Applications in Statistics, 2nd Ed. Wadsworth International Group, Belmont, CA.
Grue, Lars, and Arvid Heiberg. 2006. “Notes on the History of Normality–Reflections on the Work of Quetelet and Galton.” Scandinavian Journal of Disability Research 8 (4): 232–46.
Heath, Chip, and Dan Heath. 2007. Made to Stick. Why Some Ideas Survive and Others Die. Random House, New York.
Heathcote, James A. 1995. “Why Do Old Men Have Big Ears?” BMJ 311 (7021): 1668. https://doi.org/10.1136/bmj.311.7021.1668.
Hern, Alex. 2024. “TechScape: How Cheap, Outsourced Labour in Africa Is Shaping AI English.” The Guardian. https://www.theguardian.com/technology/2024/apr/16/techscape-ai-gadgest-humane-ai-pin-chatgpt.
Hohpe, Gregor, and Bobby Woolf. 2003. Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions. Addison-Wesley Professional.
Huff, Darrell. 1954. How to Lie with Statistics. W.W. Norton & Company, New York.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction to Statistical Learning: With Applications in r, 2nd Ed. Springer. https://www.statlearning.com/.
Karmel, Allison. 2020. “Machine Learning in the Cloud with Kubernetes.” Communications of the ACM 63 (4): 40–41.
Kleppmann, Martin. 2017. Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. O’Reilly Media.
Kuchel, Louise, and Susan Rowland. 2023. “Rhetoric, Influence, and Persuasion.” In Teaching Science Students to Communicate: A Practical Guide, edited by Louise Kuchel and Susan Rowland, 11–19. Springer Verlag, New York.
Kurgan, Lukasz A, and Petr Musilek. 2006. “A Survey of Knowledge Discovery and Data Mining Process Models.” The Knowledge Engineering Review 21 (1): 1–24.
Larson, Jeff, Surya Mattu, Lauren Kirchner, and Julia Angwin. 2016. “How We Analyzed the COMPAS Recidivism Algorithm.” ProPublica, May. https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm.
Lockett, Will. 2024. “Intel Admits AI Decreases Productivity.” https://medium.com/predict/intel-admits-ai-decreases-productivity-226681d1af18.
Longnecker, Nancy. 2023. “Good Science Communication Considers the Audience.” In Teaching Science Students to Communicate: A Practical Guide, edited by Louise Kuchel and Susan Rowland, 21–30. Springer Verlag, New York.
Loukides, Mike, Hilary Mason, and D. J. Patil. 2018. Ethics and Data Science. O’Reilly Media. https://resources.oreilly.com/examples/0636920203964/.
Lu, Jie, Liu Anjin, Dong Dan, Feng Gu, Gama Joao, and Guangquan Zhang. 2019. “Learning Under Concept Drift: A Review.” In IEEE Transactions on Knowledge and Dara Engineering, 31:2346–63. 12.
Mallick, Satya, and Sunita Nayak. 2018. “Number of Parameters and Tensor Sizes in a Convolutional Neural Network (CNN).” https://learnopencv.com/number-of-parameters-and-tensor-sizes-in-convolutional-neural-network/.
Mariscal, Gonzalo, Oscar Marban, and Covadonga Fernandez. 2010. “A Survey of Data Mining and Knowledge Discovery Process Models and Methodologies.” The Knowledge Engineering Review 25 (2): 137–66.
Martinez-Plumed, Fernando, Lidia Contreras-Ochando, César Ferri, Peter Flach, José Hernández-Orallo, Meelis Kull, Nicolas Lachiche, and Marı́a José Ramı́rez-Quintana. 2021. “CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories.” IEEE Transactions on Knowledge and Data Engineering 33 (8): 3048–61.
Marz, Nathan, and James Warren. 2015. Big Data: Principles and Best Practices of Scalable Realtime Data Systems. Manning Publications Co.
Messerli, F. H. 2012. “Chocolate Consumption, Cognitive Function, and Nobel Laureates.” New England Journal of Medicine 367: 1562–64.
Meta, Analytics at. 2023. “Data Engineering at Meta: High-Level Overview of the Internal Tech Stack.” Medium. https://medium.com/@AnalyticsAtMeta/data-engineering-at-meta-high-level-overview-of-the-internal-tech-stack-a200460a44fe.
Newman, Sam. 2015. Building Microservices: Designing Fine-Grained Systems. O’Reilly Media.
Nicoletti, Leonardp, and Dina Bass. 2023. “Humans Are Biased. Generative AI Is Even Worse.” Bloomberg Technology + Equality. https://www.bloomberg.com/graphics/2023-generative-ai-bias/.
Nowinski, Christopher J., Samantha C. Bureau, Michael E. Buckland, Maurice A. Curtis, Daniel H. Daneshvar, Richard L. M. Faull, Lea T. Grinberg, et al. 2022. “Applying the Bradford Hill Criteria for Causation to Repetitive Head Impacts and Chronic Traumatic Encephalopathy.” Frontiers in Neurology 13. https://doi.org/10.3389/fneur.2022.938163.
Oliver, Carol A. 2023. “The Social Brain and the Neuroscience of Storytelling.” In Teaching Science Students to Communicate: A Practical Guide, edited by Louise Kuchel and Susan Rowland, 21–30. Springer Verlag, New York.
OpenWeather Ltd. 2025. “OpenWeatherMap API Documentation.” https://openweathermap.org/api.
Paleyes, Andrei, Raoul-Gabriel Urma, and Neil D Lawrence. 2022. “Challenges in Deploying Machine Learning: A Survey of Case Studies.” ACM Computing Surveys 55 (6): 1–29.
Pallets Projects. 2025. “Flask: A Python Microframework.” https://flask.palletsprojects.com/.
Papazoglou, Mike P, Paolo Traverso, Schahram Dustdar, and Frank Leymann. 2007. “Service-Oriented Computing: Concepts, Characteristics and Directions.” Proceedings of the Fourth International Conference on Web Information Systems Engineering, 3–12.
Patruno, Luigi. 2020. “The Ultimate Guide to Deploying Machine Learning Models.” MLinProduction.com. https://mlinproduction.com/deploying-machine-learning-models/.
Pearl, Judea, and Dana MacKenzie. 2018. The Book of Why. Basic Books, New York.
Pedregosa, Fabian, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, et al. 2025. “Scikit-Learn: Machine Learning in Python.” Journal of Machine Learning Research. https://scikit-learn.org/.
Peng, Roger D., Sean Kross, and Brooke Anderson. 2020. Mastering Software Development in r. https://bookdown.org/rdpeng/RProgDA/.
Provost, Foster, and Tom Fawcett. 2013. Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking. O’Reilly Media.
Reitz, Kenneth, and Python Software Foundation. 2024. “Requests: HTTP for Humans.” https://requests.readthedocs.io/.
Richards, Mark, and Neal Ford. 2020. Fundamentals of Software Architecture: An Engineering Approach. O’Reilly Media.
Richardson, Chris. 2018. Microservices Patterns: With Examples in Java. Manning Publications.
Richardson, Leonard, and Sam Ruby. 2007. RESTful Web Services. O’Reilly Media. http://restfulwebapis.org/rws.html.
Rouis, Yesmine. 2023. “A Guide to MLOps with Airflow and MLflow.” https://medium.com/thefork/a-guide-to-mlops-with-airflow-and-mlflow-e19a82901f88.
Schabenberger, O., and Francis J. Pierce. 2001. Contemporary Statistical Models for the Plant and Soil Sciences. CRC Press, Boca Raton.
Sculley, D., Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015a. “Hidden Technical Debt in Machine Learning Systems.” In Proceedings of the 29th International Conference on Neural Information Processing Systems - Volume 2, 2503–11. NIPS’15. Cambridge, MA, USA: MIT Press.
Sculley, D, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015b. “Hidden Technical Debt in Machine Learning Systems.” Advances in Neural Information Processing Systems 28: 2503–11.
Shearer, Colin. 2000. “The CRISP-DM Model: The New Blueprint for Data Mining.” Journal of Data Warehousing 5 (4): 13–22.
Sherman, J., and W. J. Morrison. 1949. “Adjustments of an Inverse Matrix Corresponding to Changes in the Elements of a Given Column or a Given Row of the Original Matrix.” Annals of Mathematical Statistics 20.
Silver, Nate. 2012. The Signal and the Noise: Why so Many Predictions Fail–but Some Don’t. Penguin Books.
Snow, John. 1855. On the Mode of Communication of Cholera, 2nd. Ed. John Churchill, London. https://archive.org/stream/b28985266#page/n3/mode/2up.
Spiegelhalter, David. 2021. The Art of Statistics. How to Learn from Data. Basic Books.
Suresh, H., and J. Guttag. 2021. “A Framework for Understanding Sources of Harm Throughout the Machine Learning Life Cycle.” https://arxiv.org/pdf/1901.10002.pdf.
Tanenbaum, Andrew S, and Maarten Van Steen. 2016. Distributed Systems: Principles and Paradigms. Pearson.
Tent, M. B. W. 2006. The Prince of Mathematics: Carl Friedrich Gauss. CRC Press, Boca Raton, FL.
Tigani, Jordan. 2023. “Gig Data Is Dead.” Motherduck blog. https://motherduck.com/blog/big-data-is-dead/).
Trewin, D., N. Fisher, and N. Cressie. 2023. “The Robodebt Tragedy.” Significance 20 (6): 18–21. https://academic.oup.com/jrssig/article/20/6/18/7457250.
Tufte, E. 1983. The Visual Display of Quantitative Information. Graphics Press.
———. 2001. The Visual Display of Quantitative Information, 2nd Ed. Graphics Press.
Tukey, John W. 1962. “The Future of Data Analysis.” The Annals of Mathematical Statistics 33 (1): 1–67.
———. 1977. Exploratory Data Analysis. Pearson.
———. 1993. “Graphic Comparisons of Several Linked Aspects: Alternatives and Suggested Principles.” Journal of Computational and Graphical Statistics 2 (1): 1–33. http://www.jstor.org/stable/1390951.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” In Proceedings of the 31st International Conference on Neural Information Processing Systems, 6000–6010. NIPS’17. Red Hook, NY, USA: Curran Associates Inc.
Wang, Zi. 2019. “Predicting Time to Cook, Arrive, and Deliver at Uber Eats.” https://www.infoq.com/articles/uber-eats-time-predictions/.
Wilke, Claus O. 2019. Fundamentals of Data Visualization. O’Reilly Media. https://clauswilke.com/dataviz/.
Wilkinson, Leland. 2005. The Grammar of Graphics. Springer Verlag.
Woodbury, M. 1950. Inverting Modified Matrices. Memorandum No. 42. Statistical Research Group, Princeton University.
Yeo, In‐Kwon, and Richard A. Johnson. 2000. A new family of power transformations to improve normality or symmetry.” Biometrika 87 (4): 954–59. https://doi.org/10.1093/biomet/87.4.954.
Zewe, Adam. 2025. “Explained: Generative AI’s Environmental Impact.” https://news.mit.edu/2025/explained-generative-ai-environmental-impact-0117.
Zhou, Yue, Yue Yu, and Boyuan Ding. 2020. “Machine Learning Operations (MLOps): Overview, Definition, and Architecture.” IEEE Access 8: 140238–53.

From the project part

A Very Short History Of Data Science by Gil Press, Forbes, 2013

The History of Data Science and Pioneers You Should Know

Data Science 101: Life Cycle of a Data Science Project, Abraham Musa, Medium, 2021

Solving the Last Mile Problem for Data Science Project Success, Bill Waid, Forbes, 2019

Meet Airbnb’s official party pooper, who reduced partying by 5% in two years, CNBC, Sept. 19, 2023

From the data part

Junk Charts Trifecta Checkup: The Definite Guide, by Kaiser Fung.

Fundamentals of Data Visualization, by Claus O. Wilke, O’Reilly Media, 2019.

The science behind data visualization, by Graham Odds on Creative Blog, August 8, 2013

Designing Against Bias in Machine Learning and AI by David Corliss, AMSTAT News, September 2023

Big data ethics and 10 controversial data science experiments, by Sabrina Dominquez, Data Science Dojo, May 2018

University lecturer slams ‘sexist’ Google Translate as gender neutral languages are translated into English, Daily Mail.com, March 24, 2021

Regression to the mean: what it is and how to deal with it, by Barnett, A.G., van der Pols, J.C., and Dobson, A.J., International Journal of Epidemiology, 34(1), 2005.

Mathematical Statistics and Data Analysis, 2nd ed. John A. Rice, Duxbury Press, Belmont, CA, 1995