Lead the AI revolution with data August 22, 2023 Paul Morgan Azure, Data, Thought Leadership Lead the AI revolution with data. Trying to deliver on AI without a data strategy that addresses data quality challenges is like putting the cart before the horse, writes Paul Morgan, head: data, analytics and AI at Altron Karabina. ChatGPT exploded onto the global landscape at the end of 2022, and now Artificial Intelligence (AI) is the hot topic on everyone’s lips. Discussions abound on how well AI can write a blog post, code a website or drive a vehicle, and whether it should. According to Statista, the global AI market is valued at $142.3 billion in 2023, while McKinsey found that the adoption of AI-driven solutions had doubled since 2017. All impressive figures – but what has become apparent to adopters of AI is that without a reliable mechanism to collate, clean and pre-process the data to power the AI engines, it is extremely unlikely that the expected benefits of AI will actually arrive. Judson Althoff, executive vice-president and chief commercial officer at Microsoft, puts it this way: “As leaders look to embrace AI, it becomes more critical than ever to prioritize having a data-driven business, fortified with digital and cloud capabilities. This approach will help organizations leverage generative AI as an accelerant to transformation.” AI algorithms are trained on large sets of unstructured and structured data in the hope of providing insights for decision-making processes – but the trainers need to be sure that this source material is accurate, unbiased and appropriate. ChatGPT is a good example of this – the OpenAI Large Language Model was trained on an extraordinarily large body of text; 570Gb of documents from Reddit, Wikipedia, CommonCrawl, GitHub and other sources, and can consolidate query responses into summaries of information that are highly pertinent to the user. However, to ensure that the source texts were suitable for business consumption, human labellers were used to clean and curate the source data sets, red-flagging documents that contained misogynist, abusive, racist or other unacceptable content, as well as disinformation. In contrast, Microsoft didn’t follow this approach in 2016 when they released their Twitter-trained chatbot, Tay, and then had to strongarm the bot off the playing field after 16 hours of embarrassing and offensive tweets. Clean data is necessary in both AI and traditional analytics and it is generally accepted that there are at least six areas for data quality that jointly answer the “clean” label- accuracy, completeness, consistency, validity, integrity and uniqueness. While data quality tools have been available for decades to assist in the data quality process, there is still a considerable amount of work involved, both human and automated, to guarantee high levels of data quality. It goes without saying that there are now also AI tools available that can help improve the quality of your data. Clean data is also essential for traditional data science use cases. It’s very difficult to identify customer segments when you don’t know if the 50 Paul Morgans in your mailing list are the same person, five different people or 40 different people – and then forecasting potential revenue improvements from one customer will be very different from 40. Equally meaningless is forecasting cash flow if contracted supplier payment terms are left blank on your financial system or ERP supplier records. Data quality and stewardship have for the longest time been important success factors for delivering accurate historical analysis but become even more important for allowing accurate prediction into the future and identifying possible insights. It is very easy to go off on a completely incorrect tangent if an AI-generated insight or prediction is based on incorrect data. Model bias is another area which indicates care must be taken over the data used for algorithms. Many real-life stories have occurred where AI output discriminates against people based on race, gender, religion and other demographics. This isn’t an easy task to overcome if you have inadequate data for certain groupings available for training AI models, but if you can’t remove the bias in the training data, then you need to ensure you have tested the outputs for discernible bias at the end of the process. In the rush to use AI for real business value, let’s not forget that both humans and AI models need good samples of historical or training data sets to learn from. At Altron Karabina we have been solving customer data challenges for over 2 decades. Feel free to contact our data team if you want to discuss your data quality and data cleanliness challenges that your AI data needs – or any other data issues. This article was originally published on ITOnline, 22 August 2023 Paul is Head of Azure Platforms (which includes Data & Analytics, AI and Software Engineering) at Altron Karabina. Paul is responsible for the largest business unit within Altron Karabina, with a team of over 70 consultants working to deliver innovative customer solutions on the Microsoft Azure platform. Paul and his team successfully implement hundreds of projects a year, across four continents, and has had the great honour of being awarded the Microsoft Data & Analytics Partner of the Year in South Africa seventh times as at 2022. Tags:Artificial Intelligence Data Analytics Share on Facebook Share on Twitter Share on LinkedIn Previous Next
October 4, 2023 Johan Smith Digital Workplace, Thought Leadership, Through My Lens Bringing the Generations Together in the Workplace Read more
September 21, 2023 Collin Govender Thought Leadership, Through My Lens Remain focused when opportunity knocks Read more
September 6, 2023 Ashley Jacobs People Management, Thought Leadership Creating Interpersonal Bridges To Create A Company Culture That Thrives Read more