Artwork

المحتوى المقدم من The Nonlinear Fund. يتم تحميل جميع محتويات البودكاست بما في ذلك الحلقات والرسومات وأوصاف البودكاست وتقديمها مباشرة بواسطة The Nonlinear Fund أو شريك منصة البودكاست الخاص بهم. إذا كنت تعتقد أن شخصًا ما يستخدم عملك المحمي بحقوق الطبع والنشر دون إذنك، فيمكنك اتباع العملية الموضحة هنا https://ar.player.fm/legal.
Player FM - تطبيق بودكاست
انتقل إلى وضع عدم الاتصال باستخدام تطبيق Player FM !

LW - The Great Data Integration Schlep by sarahconstantin

15:19
 
مشاركة
 

Fetch error

Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on September 22, 2024 16:12 (5d ago)

What now? This series will be checked again in the next day. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.

Manage episode 439785818 series 3337129
المحتوى المقدم من The Nonlinear Fund. يتم تحميل جميع محتويات البودكاست بما في ذلك الحلقات والرسومات وأوصاف البودكاست وتقديمها مباشرة بواسطة The Nonlinear Fund أو شريك منصة البودكاست الخاص بهم. إذا كنت تعتقد أن شخصًا ما يستخدم عملك المحمي بحقوق الطبع والنشر دون إذنك، فيمكنك اتباع العملية الموضحة هنا https://ar.player.fm/legal.
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Great Data Integration Schlep, published by sarahconstantin on September 13, 2024 on LessWrong.
This is a little rant I like to give, because it's something I learned on the job that I've never seen written up explicitly.
There are a bunch of buzzwords floating around regarding computer technology in an industrial or manufacturing context: "digital transformation", "the Fourth Industrial Revolution", "Industrial Internet of Things".
What do those things really mean?
Do they mean anything at all?
The answer is yes, and what they mean is the process of putting all of a company's data on computers so it can be analyzed.
This is the prerequisite to any kind of "AI" or even basic statistical analysis of that data; before you can start applying your fancy algorithms, you need to get that data in one place, in a tabular format.
Wait, They Haven't Done That Yet?
In a manufacturing context, a lot of important data is not on computers.
Some data is not digitized at all, but literally on paper: lab notebooks, QA reports, work orders, etc.
Other data is is "barely digitized", in the form of scanned PDFs of those documents. Fine for keeping records, but impossible to search, or analyze statistically. (A major aerospace manufacturer, from what I heard, kept all of the results of airplane quality tests in the form of scanned handwritten PDFs of filled-out forms. Imagine trying to compile trends in quality performance!)
Still other data is siloed inside machines on the factory floor. Modern, automated machinery can generate lots of data - sensor measurements, logs of actuator movements and changes in process settings - but that data is literally stored in that machine, and only that machine.
Manufacturing process engineers, for nearly a hundred years, have been using data to inform how a factory operates, generally using a framework known as statistical process control. However, in practice, much more data is generated and collected than is actually used. Only a few process variables get tracked, optimized, and/or used as inputs to adjust production processes; the rest are "data exhaust", to be ignored and maybe deleted.
In principle the "excess" data may be relevant to the facility's performance, but nobody knows how, and they're not equipped to find out.
This is why manufacturing/industrial companies will often be skeptical about proposals to "use AI" to optimize their operations. To "use AI", you need to build a model around a big dataset. And they don't have that dataset.
You cannot, in general, assume it is possible to go into a factory and find a single dataset that is "all the process logs from all the machines, end to end".
Moreover, even when that dataset does exist, there often won't be even the most basic built-in tools to analyze it. In an unusually modern manufacturing startup, the M.O. might be "export the dataset as .csv and use Excel to run basic statistics on it."
Why Data Integration Is Hard
In order to get a nice standardized dataset that you can "do AI to" (or even "do basic statistics/data analysis to") you need to:
1.
obtain the data
2.
digitize the data (if relevant)
3.
standardize/ "clean" the data
4.
set up computational infrastructure to store, query, and serve the data
Data Access Negotiation, AKA Please Let Me Do The Work You Paid Me For
Obtaining the data is a hard human problem.
That is, people don't want to give it to you.
When you're a software vendor to a large company, it's not at all unusual for it to be easier to make a multi-million dollar sale than to get the data access necessary to actually deliver the finished software tool.
Why?
Partly, this is due to security concerns. There will typically be strict IT policies about what data can be shared with outsiders, and what types of network permissions are kosher.
For instance, in the semiconduc...
  continue reading

1851 حلقات

Artwork
iconمشاركة
 

Fetch error

Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on September 22, 2024 16:12 (5d ago)

What now? This series will be checked again in the next day. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.

Manage episode 439785818 series 3337129
المحتوى المقدم من The Nonlinear Fund. يتم تحميل جميع محتويات البودكاست بما في ذلك الحلقات والرسومات وأوصاف البودكاست وتقديمها مباشرة بواسطة The Nonlinear Fund أو شريك منصة البودكاست الخاص بهم. إذا كنت تعتقد أن شخصًا ما يستخدم عملك المحمي بحقوق الطبع والنشر دون إذنك، فيمكنك اتباع العملية الموضحة هنا https://ar.player.fm/legal.
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Great Data Integration Schlep, published by sarahconstantin on September 13, 2024 on LessWrong.
This is a little rant I like to give, because it's something I learned on the job that I've never seen written up explicitly.
There are a bunch of buzzwords floating around regarding computer technology in an industrial or manufacturing context: "digital transformation", "the Fourth Industrial Revolution", "Industrial Internet of Things".
What do those things really mean?
Do they mean anything at all?
The answer is yes, and what they mean is the process of putting all of a company's data on computers so it can be analyzed.
This is the prerequisite to any kind of "AI" or even basic statistical analysis of that data; before you can start applying your fancy algorithms, you need to get that data in one place, in a tabular format.
Wait, They Haven't Done That Yet?
In a manufacturing context, a lot of important data is not on computers.
Some data is not digitized at all, but literally on paper: lab notebooks, QA reports, work orders, etc.
Other data is is "barely digitized", in the form of scanned PDFs of those documents. Fine for keeping records, but impossible to search, or analyze statistically. (A major aerospace manufacturer, from what I heard, kept all of the results of airplane quality tests in the form of scanned handwritten PDFs of filled-out forms. Imagine trying to compile trends in quality performance!)
Still other data is siloed inside machines on the factory floor. Modern, automated machinery can generate lots of data - sensor measurements, logs of actuator movements and changes in process settings - but that data is literally stored in that machine, and only that machine.
Manufacturing process engineers, for nearly a hundred years, have been using data to inform how a factory operates, generally using a framework known as statistical process control. However, in practice, much more data is generated and collected than is actually used. Only a few process variables get tracked, optimized, and/or used as inputs to adjust production processes; the rest are "data exhaust", to be ignored and maybe deleted.
In principle the "excess" data may be relevant to the facility's performance, but nobody knows how, and they're not equipped to find out.
This is why manufacturing/industrial companies will often be skeptical about proposals to "use AI" to optimize their operations. To "use AI", you need to build a model around a big dataset. And they don't have that dataset.
You cannot, in general, assume it is possible to go into a factory and find a single dataset that is "all the process logs from all the machines, end to end".
Moreover, even when that dataset does exist, there often won't be even the most basic built-in tools to analyze it. In an unusually modern manufacturing startup, the M.O. might be "export the dataset as .csv and use Excel to run basic statistics on it."
Why Data Integration Is Hard
In order to get a nice standardized dataset that you can "do AI to" (or even "do basic statistics/data analysis to") you need to:
1.
obtain the data
2.
digitize the data (if relevant)
3.
standardize/ "clean" the data
4.
set up computational infrastructure to store, query, and serve the data
Data Access Negotiation, AKA Please Let Me Do The Work You Paid Me For
Obtaining the data is a hard human problem.
That is, people don't want to give it to you.
When you're a software vendor to a large company, it's not at all unusual for it to be easier to make a multi-million dollar sale than to get the data access necessary to actually deliver the finished software tool.
Why?
Partly, this is due to security concerns. There will typically be strict IT policies about what data can be shared with outsiders, and what types of network permissions are kosher.
For instance, in the semiconduc...
  continue reading

1851 حلقات

Усі епізоди

×
 
Loading …

مرحبًا بك في مشغل أف ام!

يقوم برنامج مشغل أف أم بمسح الويب للحصول على بودكاست عالية الجودة لتستمتع بها الآن. إنه أفضل تطبيق بودكاست ويعمل على أجهزة اندرويد والأيفون والويب. قم بالتسجيل لمزامنة الاشتراكات عبر الأجهزة.

 

دليل مرجعي سريع