• Mindstream
  • Posts
  • OpenAI harvested 1,000,000 hours of YouTube

OpenAI harvested 1,000,000 hours of YouTube

+ Opera's AI, goodbye Google Podcasts

Mindstream Banner

PRESENTED BY

Sponsor Logo

Those ancient statues you see in museums, all pristine and white? Turns out, they were originally decked out in all sorts of bright colours.

Scientists have found bits of pigment clinging to the marble, proving that what we often think of as classical elegance was actually a full-on colour blast. Our image of ancient Greek and Roman art as this monochromatic affair is way off the mark.

TODAY:

  • AI Data, ever wonder how they train the bots?

  • Tech and AI news weekly roundup!

  • Around the web: US lawmakers agreed on a data harvesting limitation and Musk’s hinting on landing a Starship booster.

  • Guidde saves you hours creating employee resources!

DATA HARVESTING

In 2021, OpenAI hit a wall: they'd used up pretty much all the useful English text from the internet to train their AI.

They needed more data.

So, they came up with Whisper, a tool to turn YouTube video audio into text.

This raised some eyebrows since using YouTube content like this could be against the rules.

Despite the concerns, OpenAI went ahead and used Whisper to transcribe over a million hours of YouTube videos.

GPT-4: powered by sneezing panda

This massive amount of data helped develop GPT-4, the latest version of their ChatGPT chatbot, taking up a level on its predecessor.

While this was happening, a few key points emerged:

  • OpenAI's move to use YouTube audio for AI training was clever but controversial. It highlighted the tricky balance between pushing tech forward and sticking to ethical guidelines.

  • Tech giants like Google and Meta are also fishing for data to improve their AI. Their aggressive tactics show just how far companies will go to feed their AI systems, sometimes stretching or ignoring rules.

  • As real, usable data gets harder to find, the idea of making up data—synthetic data—has been discussed. It's a creative fix but comes with its own set of problems, like making sure the made-up data is useful and accurate.

Data hunger games

As OpenAI and others raced to advance their AI, Google and Meta didn't sit back either.

They too looked for more and more data, even if it meant bending some rules.

For example, Google widened its terms of service, hinting they might use data from Google Docs and other services, sparking privacy worries.

The drive for more data boils down to a simple idea: the more data AI has, the smarter it gets.

This has led to AI models learning from trillions of words and aiming to mimic human abilities.

But as the pool of available data dries up, the focus might shift to synthetic data—AI-generated data.

This could sidestep some legal issues but opens up new challenges in making sure this data helps rather than hinders AI progress.

Maybe AI should lay off the YouTube conspiracy theories for training data. Just a thought.

Should tech companies be allowed to use data from different sources?

Login or Subscribe to participate in polls.

Vote for live results and see results + opinions from yesterday at the bottom of the email.

PRESENTED BY GUIDDE

We all have that one colleague who keeps asking the same thing over and over again, no matter how many times we explain it to them.

It’s time you let AI do the explaining instead of you. Guidde is an AI-powered tool that helps you explain the most complex tasks in seconds with AI-generated documentation:

  • Turn boring documentation into stunning visual guides

  • Save valuable time by creating video documentation 11x faster

  • Use it to document workflows for your teammates, share insights across your company, train and onboard new hires, and much more

Simply click capture on our browser extension and the app will automatically generate step-by-step video guides complete with visuals, voiceover and call to actions.

Guidde is used and trusted by 20,000+ users. And we’re rated 5/5 stars on the Google Chrome store.

On This Day

8th April 1990 - On this day, ABC aired the first episode of David Lynch’s “Twin Peaks”, immersing viewers in the mystery of “Who Killed Laura Palmer?” and marking a pivotal moment in TV history.

AI Tools to Start Your Week

Tabnine: An AI assistant designed for developers, speeding up code delivery and ensuring code safety by offering intelligent code completions based on the context of your work.

Kittl: A user-friendly design platform powered by AI, allowing users to create stunning designs effortlessly. It's designed to help both amateurs and professionals improve their design skills and output.

Success.ai: Targets marketing and sales professionals by offering access to over 700M+ B2B leads, with features for unlimited emails, automated warmups, and advanced AI-powered writing.

venturefy: Utilises AI to codify and verify corporate relationships, acting as a "blue-check" for business by enabling users to quickly identify and build trust with potential business partners.

Pitchnhire: An AI-based powerful hiring software that aids in managing job pipelines, assessments, video interviews, and other recruitment funnels, streamlining the recruitment process.

AI + TECH

We have a tech and AI roundup this week, where we’re looking at how things are changing in the tech world, from the end of Google Podcasts to the rise of smarter gadgets.

Here’s the lowdown:

Google Podcasts is wrapping up - did anyone even use it?

Meanwhile, AI is making our devices more intelligent and tailored to our needs.

Opera is leading a cool move by letting you use AI models right in your browser—keeping things private and under your control.

Everything’s getting more personalised

This makes your browsing better and puts you in the driver's seat of your online world.

Brave is also in on the action, bringing its AI chatbot, Leo, to iPhone users, making sure you get smart help without giving away your privacy.

This is a big leap in making AI helpful yet secure.

Key points to note:

  • Opera and Brave are changing the game with AI in browsers, making sure you stay in control and your data stays private.

  • The Chicago Humanities Festival is the place to be for anyone curious about how AI and creativity can play together. Expect interesting talks from folks who are pushing the boundaries.

  • Making tech your own is a big theme, with personalized experiences from AI browser tricks to custom home screens.

On the fun side, the Chicago Humanities Festival is gearing up to blend tech talks with creative sparks, promising great insights on AI’s role in creativity.

Plus, the web’s buzzing with cool stuff, from deep dives into Wikipedia to debates on AI’s real impact versus the hype.

You just never get bored with AI and Tech, like ever.

Mindstream Picks

Cathie Wood's firm, Arkham, inadvertently owns a Bitcoin Puppet worth $15,000, raising questions about its origin and connection to their ARKB ETF.

Elon Musk hinted at SpaceX's plan to land a Starship booster as early as its fifth flight, aiming to boost flight rates and vehicle performance.

US lawmakers reached a bipartisan agreement on draft data privacy legislation, aiming to limit tech companies' data collection and grant individuals control over personal information.

Apple now permits retro console game emulators on its App Store globally, echoing Google's support for emulators on Android devices.

Don’t Miss - Two US lawmakers announced a bipartisan agreement on draft data privacy legislation, aiming to limit tech companies' collection of consumer data and grant individuals control over their personal information, including the ability to prevent its sale or request deletion.

AI Art

Artwork submitted by Mindstream reader Roland S: “A castle made of rainbows”

People standing watching the sun during a solar eclipse

We get a lot of submissions, but we do look at every single one! So please don’t hesitate to send us your art.

Yesterday’s Poll

“Do you use Midjourney for image generation?”

Yes, it's the best! 70%

No, i'm all about DALL-E! 30%

Reader’s Opinions

“It's the best for photorealistic images. Shame it's still on Discord 😔- peter@so

“I've never found a way to use it. Oromotnit on discord and I never get a response, and I didn't want to sit forever on a wait list to pay either. So my understanding of mid journey is it's inaccessible. Whereas I just use Dalle in my chatgpt subscription. ”- soulfiremage

Submit your opinions in our daily poll for the chance to be featured!

Collage of Authors

How's that Monday treating you? We hope we made your day a little bit brighter!

❤️ We need your feedback to make our newsletter better.

📣 Refer our newsletter to your friends!

🚀 Advertise in our newsletter to reach 120,000+ founders, engineers, and content creators.

Rate Us Today!

Login or Subscribe to participate in polls.