References

Abela, Andrew. 2020. “Choosing a Good Chart.” https://extremepresentation.typepad.com/blog/2006/09/choosing_a_good.html.
Adhikari, Ani, John DeNero, and David Wagner. 2022. Computational and Inferential Thinking: The Foundations of Data Science. 2nd Ed. https://inferentialthinking.com/chapters/intro.html.
Angwin, Julia, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. “Machine Bias—Risk Assessment in Criminal Sentencing.” ProPublica, May. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.
Appel, Gil, Juliana Neelbauer, and David A. Schweidel. 2023. “Generative AI Has an Intellectual Property Problem.” Harvard Business Review. https://hbr.org/2023/04/generative-ai-has-an-intellectual-property-problem.
Bashir, Noman, Priya Donti, James Cuff, Sydney Sroka, Marija Ilic, Vivienne Sze, Christina Delimitrou, and Elsa Olivetti. 2024. “The Climate and Sustainability Implications of Generative AI.” An MIT Exploration of Generative AI.
Benson H., Dusek J. A., Sherwood J. B., P. Lam, C. F. Bethea, W. Carpenter, S. Levitsky, et al. 2006. “Study of the Therapeutic Effects of Intercessory Prayer (STEP) in Cardiac Bypass Patients: A Multicenter Randomized Trial of Uncertainty and Certainty of Receiving Intercessory Prayer.” American Heart Journal 151 (4): 934–42.
Berreby, David. 2024. “As Use of A.I. Soars, so Does the Energy and Water It Requires.” https://e360.yale.edu/features/artificial-intelligence-climate-energy-emissions.
Borne, Kirk. 2021. “Data Profiling–Having That First Data with Your Data.” Medium. https://medium.com/codex/data-profiling-having-that-first-date-with-your-data-2e05de50fca7.
Box, G. E. P., and D. R. Cox. 1964. “An Analysis of Transformations.” Journal of the Royal Statistical Society. Series B (Methodological) 26 (2): 211–52. http://www.jstor.org/stable/2984418.
Box, George E. P. 1976. “Science and Statistics.” Journal of the American Statistical Association 71 (356): 791–99.
Box, George E. P., and Norman R. Draper. 1987. Empirical Model-Building and Response Surfaces. John Wiley & Sons, New York.
Breiman, Leo. 2001. “Statistical Modeling: The Two Cultures.” Statistical Science 16 (3): 199–231.
Burgess, Matt. n.d. “Strava’s Heatmap Was a ‘Clear Risk’ to Security, UK Military Warned.” https://www.wired.co.uk/article/strava-heat-maps-military-app-uk-warning-security.
Cleveland, William S. 2001. “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics.” International Statistical Review / Revue Internationale de Statistique 69 (1): 21–26. http://www.jstor.org/stable/1403527.
Cox, M., and D. Ellsworth. 1997. “Application-Controlled Demand Paging for Out-of-Core Visualization.” In Proceedings. Visualization ’97 (Cat. No. 97CB36155), 235–44. https://doi.org/10.1109/VISUAL.1997.663888.
Cressie, N. 2023. “Robodebt Not Only Broke the Law of the Land – It Also Broke Laws of Mathematics.” The Conversation. https://theconversation.com/robodebt-not-only-broke-the-laws-of-the-land-it-also-broke-laws-of-mathematics-201299.
Cunningham, Adam. 2024. “Probability Playground.” Amstat News, 26–28.
Davenport, Thomas H., and D. J. Patil. 2012. “Data Scientist: The Sexiest Job of the 21st Century?” Harvard Business Review. https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century.
———. 2022. “Is Data Scientist Still the Sexiest Job of the 21st Century?” Harvard Business Review. https://hbr.org/2022/07/is-data-scientist-still-the-sexiest-job-of-the-21st-century.
Debortoli, Stefan, Oliver Müller, and Jan vom Brocke. 2014. “Comparing Business Intelligence and Big Data Skills.” Business & Information Systems Engineering 6 (5): 289–300.
Gelman, A., and A. Unwin. 2013. “Infovis and Statistical Graphics: Different Goals, Different Looks.” Journal of Computational and Graphical Statistics 22: 2–28. https://www.tandfonline.com/doi/full/10.1080/10618600.2012.761137.
Godsey, Brian. 2017. Think Like a Data Scientist. Manning Publications. https://www.oreilly.com/library/view/think-like-a/9781633430273/.
Grue, Lars, and Arvid Heiberg. 2006. “Notes on the History of Normality–Reflections on the Work of Quetelet and Galton.” Scandinavian Journal of Disability Research 8 (4): 232–46.
Heathcote, James A. 1995. “Why Do Old Men Have Big Ears?” BMJ 311 (7021): 1668. https://doi.org/10.1136/bmj.311.7021.1668.
Hern, Alex. 2024. “TechScape: How Cheap, Outsourced Labour in Africa Is Shaping AI English.” The Guardian. https://www.theguardian.com/technology/2024/apr/16/techscape-ai-gadgest-humane-ai-pin-chatgpt.
Huff, Darrell. 1954. How to Lie with Statistics. W.W. Norton & Company, New York.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction to Statistical Learning: With Applications in r, 2nd Ed. Springer. https://www.statlearning.com/.
Kurgan, Lukasz A, and Petr Musilek. 2006. “A Survey of Knowledge Discovery and Data Mining Process Models.” The Knowledge Engineering Review 21 (1): 1–24.
Larson, Jeff, Surya Mattu, Lauren Kirchner, and Julia Angwin. 2016. “How We Analyzed the COMPAS Recidivism Algorithm.” ProPublica, May. https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm.
Lockett, Will. 2024. “Intel Admits AI Decreases Productivity.” https://medium.com/predict/intel-admits-ai-decreases-productivity-226681d1af18.
Loukides, Mike, Hilary Mason, and D. J. Patil. 2018. Ethics and Data Science. O’Reilly Media. https://resources.oreilly.com/examples/0636920203964/.
Lu, Jie, Liu Anjin, Dong Dan, Feng Gu, Gama Joao, and Guangquan Zhang. 2019. “Learning Under Concept Drift: A Review.” In IEEE Transactions on Knowledge and Dara Engineering, 31:2346–63. 12.
Mallick, Satya, and Sunita Nayak. 2018. “Number of Parameters and Tensor Sizes in a Convolutional Neural Network (CNN).” https://learnopencv.com/number-of-parameters-and-tensor-sizes-in-convolutional-neural-network/.
Mariscal, Gonzalo, Oscar Marban, and Covadonga Fernandez. 2010. “A Survey of Data Mining and Knowledge Discovery Process Models and Methodologies.” The Knowledge Engineering Review 25 (2): 137–66.
Martinez-Plumed, Fernando, Lidia Contreras-Ochando, César Ferri, Peter Flach, José Hernández-Orallo, Meelis Kull, Nicolas Lachiche, and Marı́a José Ramı́rez-Quintana. 2021. “CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories.” IEEE Transactions on Knowledge and Data Engineering 33 (8): 3048–61.
Messerli, F. H. 2012. “Chocolate Consumption, Cognitive Function, and Nobel Laureates.” New England Journal of Medicine 367: 1562–64.
Meta, Analytics at. 2023. “Data Engineering at Meta: High-Level Overview of the Internal Tech Stack.” Medium. https://medium.com/@AnalyticsAtMeta/data-engineering-at-meta-high-level-overview-of-the-internal-tech-stack-a200460a44fe.
Nicoletti, Leonardp, and Dina Bass. 2023. “Humans Are Biased. Generative AI Is Even Worse.” Bloomberg Technology + Equality. https://www.bloomberg.com/graphics/2023-generative-ai-bias/.
Nowinski, Christopher J., Samantha C. Bureau, Michael E. Buckland, Maurice A. Curtis, Daniel H. Daneshvar, Richard L. M. Faull, Lea T. Grinberg, et al. 2022. “Applying the Bradford Hill Criteria for Causation to Repetitive Head Impacts and Chronic Traumatic Encephalopathy.” Frontiers in Neurology 13. https://doi.org/10.3389/fneur.2022.938163.
Pearl, Judea, and Dana MacKenzie. 2018. The Book of Why. Basic Books, New York.
Peng, Roger D., Sean Kross, and Brooke Anderson. 2020. Mastering Software Development in r. https://bookdown.org/rdpeng/RProgDA/.
Provost, Foster, and Tom Fawcett. 2013. Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking. O’Reilly Media.
Shearer, Colin. 2000. “The CRISP-DM Model: The New Blueprint for Data Mining.” Journal of Data Warehousing 5 (4): 13–22.
Silver, Nate. 2012. The Signal and the Noise: Why so Many Predictions Fail–but Some Don’t. Penguin Books.
Snow, John. 1855. On the Mode of Communication of Cholera, 2nd. Ed. John Churchill, London. https://archive.org/stream/b28985266#page/n3/mode/2up.
Spiegelhalter, David. 2021. The Art of Statistics. How to Learn from Data. Basic Books.
Suresh, H., and J. Guttag. 2021. “A Framework for Understanding Sources of Harm Throughout the Machine Learning Life Cycle.” https://arxiv.org/pdf/1901.10002.pdf.
Tent, M. B. W. 2006. The Prince of Mathematics: Carl Friedrich Gauss. CRC Press, Boca Raton, FL.
Tigani, Jordan. 2023. “Gig Data Is Dead.” Motherduck blog. https://motherduck.com/blog/big-data-is-dead/).
Trewin, D., N. Fisher, and N. Cressie. 2023. “The Robodebt Tragedy.” Significance 20 (6): 18–21. https://academic.oup.com/jrssig/article/20/6/18/7457250.
Tufte, E. 1983. The Visual Display of Quantitative Information. Graphics Press.
———. 2001. The Visual Display of Quantitative Information, 2nd Ed. Graphics Press.
Tukey, John W. 1962. “The Future of Data Analysis.” The Annals of Mathematical Statistics 33 (1): 1–67.
———. 1977. Exploratory Data Analysis. Pearson.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” In Proceedings of the 31st International Conference on Neural Information Processing Systems, 6000–6010. NIPS’17. Red Hook, NY, USA: Curran Associates Inc.
Wilkinson, Leland. 2005. The Grammar of Graphics. Springer Verlag.
Zewe, Adam. 2025. “Explained: Generative AI’s Environmental Impact.” https://news.mit.edu/2025/explained-generative-ai-environmental-impact-0117.

From the project part

A Very Short History Of Data Science by Gil Press, Forbes, 2013

The History of Data Science and Pioneers You Should Know

Data Science 101: Life Cycle of a Data Science Project, Abraham Musa, Medium, 2021

Solving the Last Mile Problem for Data Science Project Success, Bill Waid, Forbes, 2019

The Ultimate Guide to Deploying Machine Learning Models, Luigi Patruno, 2020

Meet Airbnb’s official party pooper, who reduced partying by 5% in two years, CNBC, Sept. 19, 2023

From the data part

Junk Charts Trifecta Checkup: The Definite Guide, by Kaiser Fung.

Fundamentals of Data Visualization, by Claus O. Wilke, O’Reilly Media, 2019.

The science behind data visualization, by Graham Odds on Creative Blog, August 8, 2013

Designing Against Bias in Machine Learning and AI by David Corliss, AMSTAT News, September 2023

Big data ethics and 10 controversial data science experiments, by Sabrina Dominquez, Data Science Dojo, May 2018

University lecturer slams ‘sexist’ Google Translate as gender neutral languages are translated into English, Daily Mail.com, March 24, 2021

Regression to the mean: what it is and how to deal with it, by Barnett, A.G., van der Pols, J.C., and Dobson, A.J., International Journal of Epidemiology, 34(1), 2005.

Causality, 2nd ed., Judea Pearl, Cambridge University Press, 2009

Matrices with Applications in Statistics, 2nd ed., Franklin A. Graybill, Wadsworth International Group, Belmont, CA., 1983

Theory and Application of the Linear Model, Franklin A. Graybill, Duxbury Press, North Scituate, Massachusetts, 1976

Adjustments of an inverse matrix corresponding to changes in the elements of a given column or a given row of the original matrix. Sherman, J., and Morrison, W.J. Ann. Math. Stat., 20, 621, 1949

Inverting modified matrices. Woodbury, M., Memorandum No. 42, Statistical Research Group, Princeton University, 1950.

Mathematical Statistics and Data Analysis, 2nd ed. John A. Rice, Duxbury Press, Belmont, CA, 1995