(110h) Data Centric AI—How to Build the Software 2.0 Stack to Maximize ROI on Unstructured Data

Conference

AIChE Spring Meeting and Global Congress on Process Safety

Year

2023

Proceeding

2023 Spring Meeting and 19th Global Congress on Process Safety

Group

Industry 4.0 Topical Conference

Session

Poster Session: Industry 4.0/Analytics & AI

Time

Monday, March 13, 2023 - 5:00pm to 7:00pm

Authors

Van Meurer, M. - Presenter

Data Centric AI - the way to handle unstructured data and develop great AI models

Data-centric AI is the practice of systematically engineering the data used to build AI systems. AI is made of both code and data. Historically, AI has focused primarily on code, with researchers building ever more sophisticated models on fixed datasets. Think GPT-3 with and its 200 billions parameters. $12 million for a single training run. But the real-world experience of those who put them into production shows that if you're trying to improve your model, it's often the quality of data and iterating on it that makes your AI project succeed or fail. And quality means two things. For a model to perform well, you need both clean data and diverse data.

Set up a Software 2.0 stack

Starting a company today without machine learning is like starting a company ten years ago without software. AI is software 2.0. The human instructs the machine, line by line. Thatâ€™s Software 1.0. Software 2.0 is a neural network that learns which rules are needed for the desired outcome. Software 2.0 is king when the algorithm itself is difficult to design explicitly. Think object detection in images. If you recognize Software 2.0 as a new and emerging programming paradigm and DCAI as agile for Software 2.0, you need to set up a Software 2.0 stack.

Find the right IDE

To deliver good software, all the developers in the world write code in a dedicated software named an IDE - Integrated Development Environment. An IDE is the Microsoft Word for code. A good IDE is designed to write good code. Not just a lot of code. A good IDE is designed to develop in an interactive and iterative way, with a short time between development and testing. A good IDE provides syntax highlighting, debuggers, profilers, go to def, git integration, etc. In the Software 2.0 stack, the programming is done by accumulating, massaging and cleaning datasets. To switch to Software 2.0, you need a Software 2.0 IDE. It helps with all of the workflows in accumulating, visualizing, cleaning, labeling, and sourcing datasets. It bubbles up images that the network suspects are mislabeled. It assists in labeling by seeding labels with predictions. It suggests useful examples to label based on the uncertainty of the networkâ€™s predictions. It shares knowledge and enforces the reuse of data across the organization.

Because it reduces development time

One of the key success factor of the software industry over the past years has been agile. It all started in the spring of 2000, when a group of 17 software developers, including Martin Fowler & co met in Oregon to discuss how they could speed up development times in order bring new software to market faster. They made development, and testing activities concurrent, allowing more communication between developers, managers, testers, and customers. It increased cost-effectiveness, productivity, quality, cycle-time reduction, and customer satisfaction from 30% to 100% Data-centric AI is the agile of AI. Labeling, model training and model diagnostic can work in parallel and directly influence the data used for the AI system. It removes the unnecessary trial-and-error time spent on improving the model without changing inconsistent data and reduces the development time up to 10x faster.

Set up a human-in-the-loop culture

Revolutionary change is not linear or constant. It is the chaos that disturbs the organization and leads to the reshaping of its culture. DCAI means bringing human intelligence to machine learning. DCAI means leveraging human expertise to train good AI. To do so, you need to put humans right in the center, in a human-in-the-loop machine learning process. This is not trivial. It will change your development processes. It will change the structure of your organization. It may change your business model. It will involve forming a new vision and a new mission. If you do it, you will: improve the consistency, accuracy, transparency, and safety of your models. To succeed, start small. Execute DCAI pilot projects to gain momentum Build a multi-disciplinary in-house DCAI team made of subject matter experts, ML engineers, and data quality managers. Provide broad DCAI training.

Topics

Computing and Systems Engineering

Safety & Health

Manage Risks

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2024 Annual Safety in Ammonia Plants and Related Facilities Symposium

4th Optogenetic Technologies and Applications Conference

Upcoming Conferences & Events

CCPS Workshop on Process Safety Metrics: API-RP-754 Implementation

University of Houston Student Process Safety Bootcamp

2024 Annual Safety in Ammonia Plants and Related Facilities Symposium

9th CCPS Canadian Regional Meeting

4th Optogenetic Technologies and Applications Conference

tcbiomass 2024

AIChE 2024 Virtual Career Fair for Professionals

CCPS Pharma, Food, and Fine Chemicals Meeting, September 2024

10th Latin American Conference on Process Safety

CEP: August 2024

CEP: July 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(110h) Data Centric AI—How to Build the Software 2.0 Stack to Maximize ROI on Unstructured Data

AIChE Spring Meeting and Global Congress on Process Safety

2023

2023 Spring Meeting and 19th Global Congress on Process Safety

Industry 4.0 Topical Conference

Poster Session: Industry 4.0/Analytics & AI

Monday, March 13, 2023 - 5:00pm to 7:00pm

Authors

Topics

More Conference Links

Cancelation Policy

Register

Accommodations

Ethylene Producers' Conference

Code of Conduct

Beware of Hotel and Attendee-list Scams