Data Identification
Its the Data…
Yes, a play on the age old, “Its the economy stupid!”. But, as I think about the Jor-El system I want to create, I keep going back to the idea that it all starts with the data. What we can build and form knowledge off of.
This section will cover the data I plan to use, media types selected, sources used to collect the media and what specifically the media types offer as it relates to our data.
Before we start let’s touch on the goal again…which you can find here and read at your won leisure.
What should the data help answer
The data I would like to extract from each media type are descriptors that will help build knowledge around the subject. Things like:
- What does the subject like/dislike
- What does the subject looks/looked like
- What does the subject think about life/love/sports/food/etc
- Childhood memories
- Schools they attended
- Past loves
You get the idea…
This data once extracted will serve as a basis to query and build, what I hope, new ideas on top of.
The Media Types
It wasn’t that I narrowed down what media types to select but realized these are the only types of media available. Things like bank statements, medical records, funny TikToks simply fall into one of the media type buckets identified below.
For clarity, I’ve used the below media types:
- Text
- Images
- Audio
- Video
Media Types and Data Association
So, what do these media types bring to the table? When honed in on media created by the subject or media about the subject we can extract a wealth of foundational info. For example, using Social Media we can extract what the person ate last night, in the form of a “post” on Facebook/Twitter, to where the subject has visited in the form of a posted image or posted video on those platforms.
Using videos and images, we can extract what the subject looked like, how the subject sounds, even extract mannerism the subject used when speaking. Something I feel will make the system more relatable when talking with it.
Moving deeper into media-info-extraction, using text, video, or audio, we can extract sentiment from text/audio to determine general feelings about a subject matter. Or even create tag clouds about the general topics of the text/audio/video which we can then use to link similar media in the system. The building of a knowledge graph.
The Mapping
The below list outlines the media types, the media elements to use, and what it provides. It’s a start and any feedback is welcomed.
- Text
- In the form of (Media Elements):
- Blogs
- Social Media Comments
- Social Media Comments on other user comment/articles.
- Social Media Emoji reactions to comments/articles.
- Social Media Shares.
- Social Media Image descriptions/comments.
- Scanned Documents. (from the subject or about the subject)
- Converted Audio to Text
- Video to Audio to Text
- Bank transactions/statements
- Medical records
- Will provide
- General Information about the person (name, age, dob, where they lived, favorite X, married, etc)
- What they dislike/hate/like/enjoy/etc
- Ideas/Thoughts on specific subject matter. From Comments about the subject or emoji reaction to an article or another person’s comment/share/video/audio.
- Descriptors of how the look.
- Color of hair, eyes. Etc.
- How they connect things together.
- Tag Cloud might produce something yet their comments produces something else.
- info + info = idea ?
- Locations visited.
- How do they feel about specific topics. (Sentiment Analysis)
- In the form of (Media Elements):
- Images
- In the form of: (Media Elements)
- Social Media posted images
- Social media posted images about the subject. (Friends posted images)
- Online Photo Albums
- User provided/uploaded images.
- Reactions to images (Emoji/comments – See Text Section)
- Will provide.
- Information about the person
- What they look like at a specific time (Age)
- Who they know (people in the image)
- Visited locations.
- Items they might know about. (Image analysis)
- What’s in it
- Whose in it
- Sentiment (can you do this? Are people happy? Sad? Mad?)
- Information about the person
- In the form of: (Media Elements)
- Video
- In the form of: (Media Elements)
- User uploaded videos of subject or by the subject.
- Social Media videos of the subject or by the subject.
- Social media videos shared by the subject.
- Videos liked/emoji reactions.
- Will provide:
- 3d Model data.
- What the subject looks like.
- How the subject moves. (Facial expressions, posture, hand gestures, movement while talking)
- Image Analysis – See Image section.
- Audio – See Audio section.
- Text – See Text section.
- 3d Model data.
- In the form of: (Media Elements)
- Audio
- In the form of: (Media Elements)
- Video – See Video Section.
- Uploaded Audio recordings by subject or about the subject.
- Playlists/Watchlist
- Will Provide
- What the subject sounds like.
- How they talk.
- Feeling about a specific subject – Audio Analysis.
- Text – See Text Section
- What the subject likes to watch.
- What music the subject liked/likes to listen to.
- In the form of: (Media Elements)
Ill continue to flesh out this section as I hone in on each Media Type. It’s a start.
Sources
Moving on. Where are all these media elements come from. As Ive pointed out in the table, much of this will come from social media but I will create an upload tool to allow a user to upload media types about the subject.
Social Media
- Youtube
- Flickr
Other Sources
- Personal Blogs
User Provided
- Upload tools
Final thoughts
As I’m organizing the collected data I’m wondering if anyone has created an info-gathering chart. For example, On a blog post you can extract different levels of information.
Example:
- Level one – text
- Level two – subject / general topic (tag clouds?)
- Level tree – sentiment,
- Level four – how It connects to other things. (Semantic/Ontology)
- Level five – how does that comment, when connected to other piece, form a better picture of an opinion.
Yes, I need to organize my thoughts on the above better….
Thanks for reading. Time for the Data Ingestion piece!
Armando Padilla