Data science examined: AI and data in detail

AI, data analysis. Business people use AI to analyze financial related data. big data Complex performance measurement With modern innovative technology
image: ©WANAN YOSSINGKUM | iStock

Dr Clare Walsh, Director of Education at the Institute of Analytics, highlights data science in a discussion that also includes comments on artificial intelligence (AI)

Near the end of 2024, decision intelligence is recognised across industries, with 98% of employers now seeking digital and data skills amongst graduates (1), whatever their degree course. As data science becomes mainstream, aligning data roles with broad business strategies is more important than ever, particularly as the field evolves.

One of the primary hurdles is the gap between advancements in the “syntax” of data science, such as the growing range of algorithms, and the “semantics”, or the interpretation and meaning, of data. While algorithms have evolved rapidly, refining data for accuracy and meaning remains complex.

Much web data is blocked, spoofed, and attributable to weaponised false data campaigns or an artefact of simply having more bots out on the web. The lack of standardised practices, like agreed taxonomies for evaluating the quality of web-scraped data, hampers clarity, and we have not progressed very much in shared language around data quality to build trust and consistency across the field.

AI and data practices

Explainable AI (XAI) is another area where progress has disappointed despite significant investment. Ironically, the rise of generative AI may have set XAI efforts back. Generative models lack standardised benchmarks, especially for non-text applications and established validation methods such as red-teaming are still underdeveloped.

Some recent advances, like those from DeepMind, have sparked optimism in model validation, but a lot still remains to be done. We have seen the most uptake of data solutions in sectors like healthcare, where data scientists can fall back on sector-established guidelines (such as clinical trials) and formalised testing and approval in the absence of clear guidance within our field.

Legislation to govern AI and data practices is emerging in the European Union and Colorado in addition to GDPR provision under Article 22, promising a proactive approach compared to the unregulated growth of social media in the early 2000s. However, it remains unclear how these laws will be interpreted in court. Without clear guidelines on acceptable practices, some data activities could be easily misrepresented.

For instance, while “removing bias” from datasets is often presented as an achievable goal, some data segments will inevitably appear disadvantaged due to natural statistical variance. Without nuanced guidelines, the general public may misinterpret these findings,
potentially seeing discrimination where none exists. The data science community must define what constitutes acceptable versus unfair bias to address these concerns and build trust.

Generative AI (Gen AI), introduced in 2017, has had limited influence on most practising data scientists. Foundation models, designed to predict the next word or token in a sequence, have yet to prove themselves as reliable revenue generators. While search engines paired successfully with advertising and online retail has flourished, Gen AI has yet to find a similar revenue-driving foundation. Despite nearly $1 trillion invested in Gen AI to date, the technology has produced only around $1 billion in returns. This discrepancy highlights the need for more practical, monetisable applications of Gen AI.

Retrieval-augmented generation (RAG) models could change this, and data scientists will likely be called on to help businesses integrate these tools. Although interfaces like ChatGPT simplify generative AI usage, they do not guarantee effective or safe application. As data science tools become more accessible, the need for skilled professionals becomes even more critical.

While platforms like Alteryx enable machine learning through simple point-and-click, misuse by inexperienced users could lead to significant errors. Dimensionality reduction, imputation, and data distribution assumptions are key areas where small missteps can have outsized impacts on outcomes. For example, assuming data is normally distributed can create serious misrepresentations, especially as “normal” distribution is often rare in real-world datasets. The development of easy-to-use tools has given rise to “amateur” data scientists and self-appointed AI consultants, highlighting the need for experienced professionals to make an even stronger case for safe and accurate practices.

Has there been a surge in data science programmes?

The educational landscape is responding well to rising demand, with a surge in data science programmes. Those programmes cover foundational skills, but many graduates now find themselves entering roles that demand specialisation, and may lack adequate preparation for such tasks. HR departments often struggle to assess data science CVs, further complicating the path for early-career data scientists – particularly those who find themselves the sole data experts within their organisations. Enhanced onboarding and mentorship would help these professionals thrive and contribute effectively to their roles.

For research-focused graduates, the path forward can be equally challenging. The private sector lags behind academic institutions in research output in the UK, which has led to a migration of our top talent to countries offering better research and career opportunities. Addressing this imbalance could foster a stronger data science ecosystem within the UK and ensure that top talent finds pathways to contribute within the country. (2)

Will data science integrate more deeply into work processes?

In the year ahead, we anticipate data science will become even more deeply integrated into daily work processes. With a growing number of professionals, the field is well-positioned to tackle persistent challenges, from improving data quality standards to establishing ethical guidelines for AI practices. Building stronger connections among data professionals, legal frameworks, and organisational leaders will be essential in shaping a future where data science is not only valuable but also managed responsibly in the workplace.

1. https://26055784.fs1.hubspotusercontent-eu1.net/hubfs/26055784/Third%
20party%20events/Digital-GME-The%20Skills%20Gap.pdf?utm_campaign=IHEF
&utm_medium=email&_hsenc=p2ANqtz-80lySVgjKuvXChYKmEpkRKY3DaTiuzSi4
T9QdEhQwHqsmgjyGY4sUK72PGvs2kg9YAL7umDyRj6KYayQYPw0RLBktaK2qsg
DPGSh06afI1CixlJac&_hsmi=85912103&utm_content=85912103&utm_source=
hs_automation

2. https://www.ukonward.com/wp-content/uploads/2022/08/Rocket-Science.pdf

Contributor Details

LEAVE A REPLY

Please enter your comment!
Please enter your name here