How to navigate the post ChatGPT world as a data scientist

Jingting Cher, Deputy Director, Data Science, SP Group

Jingting Cher, Deputy Director, Data Science, SP Group

Full disclaimer: these views are solely mine and do not represent any organisations I may be affiliated with. Also, this is 100% human written, and no AI was involved.

These are exciting and challenging times to be a data scientist. Interest in data science and artificial intelligence (AI) has never been more widespread, while at the same time, expectations have never been higher. Ever since ChatGPT was launched publicly, it has set the record for the fastest user growth by reaching 100 million active users in 2 months, fascinated even AI sceptics, and spurred countless conversations and innovative use cases.

ChatGPT APIs with prompt engineering may be the fastest solution, but it can lead to data security issues and production costs scaling out of proportion

Essentially, ChatGPT is a breakthrough at many levels: for the vast majority, they can directly interact with AI for the first time; for data scientists, emerging abilities of large language models (LLMs) have opened up infinite potential applications; for corporates, ChatGPT has sparked off a tech arms race that is making an increasing number of LLMs available for commercial use.

As AI gatekeepers and SMEs, this profoundly impacts how data scientists work and communicate with business leaders and stakeholders. Stakeholders used to think they knew nothing about AI, but now they think they know what it is. But generative AI enabled by LLMs differs largely from traditional AI in many areas, including data, hardware, usage, and risks; therefore, balancing optimism and limitations is crucial for data scientists to start their organisations on the generative AI journey with the right expectations.

So how should data scientists navigate this post-ChatGPT world? First, stay up to date with the latest AI news. There is tons of advice online, from following technical blogs to social media channels. With the AI world moving so fast now, learning to distill the relevant information and stay on top of strategic discussions will require a strong technical foundation in LLMs and natural language processing (NLP). Key concepts include transformers, word embeddings, vector databases, and prompt engineering.

Data scientists should also know how organisations can leverage LLMs and generative AI. ChatGPT APIs with prompt engineering may be the fastest solution, but it can lead to data security issues and production costs scaling out of proportion. Enterprise generative AI services offered by cloud providers may alleviate some of these problems, but they have barely been rolled out and will have teething issues. Developing in-house LLMs will require the most effort and expertise, but with the increasing prevalence of open-source pre-trained LLMs and innovative ways to fine-tune them using limited resources, the technical and data barrier is even low enough for data scientists with no NLP experience in small organisations to start building one.

If this has not already happened, data scientists should start engaging other stakeholders in the organisation, such as business, corporate functions, design, and engineering, to identify and prioritise use cases based on data availability, business value, and risk appetite. Given that generative AI has demonstrated capabilities that can scale over a wide range of tasks, it is possible to explore multiple use cases simultaneously. Still, the emphasis should be on quick experimentation and proof-of-value rather than hard deliverables and KPIs.

Last but most importantly, act now, start small, experiment broadly, fail fast, and learn faster. The potential for developing a foundation AI that can serve multiple business needs, boost productivity, and unlock new opportunities, as well as the need to establish guardrails and governance to manage risks, mean that existing corporate AI strategies and roadmaps have to be revised iteratively as data scientists and stakeholders learn to adopt generative AI for their business.

Regardless of how (over) hyped ChatGPT is, one thing is certain: LLMs and generative AI have redefined how users interact with AI, opening up a world of new business opportunities. Data scientists should lead their organisations to explore this whole new AI world or risk falling into obscurity as the AI revolution passes them by.

Christopher Davis, Chief Information Officer, the Tile Shop

Brett Raven, CIO, Big Red Group

Sahal Laher, CDO & CIO, Destination XL Group

Vic Peterson, CIO, Stinson Leonard Street LLP

Kevin Glynn, VP & CIO, DSC Logistics

Steve Rempel, CIO, Rite Aid

Ramesh Narayanaswamy, CIO, SingPost

Samuel Budianto, Head Of Information Technology, Time International

How to navigate the post ChatGPT world as a data scientist

Data Security

Natural Language Processing

Weekly Brief