Data types involved in Big Data analytics are many like Structured Data, Semi structured data and Unstructured Data
Structured Data
Any data that can be stored, accessed and processed in the form of fixed format is termed as a ‘structured’ data.
A Student table in a database is an example of Structured Data
Student_ID | Student Name | Gender | Department |
2365 | Rajesh Kulkarni | Male | Finance |
3398 | Pratibha Joshi | Female | Marketing |
7465 | Shushil Roy | Male | HR |
7500 | Shubhojit Das | Male | Finance |
7699 | Priya Sane | Female | Finance |
Characteristics of Structured Data:
- Data conforms to a data model and has easily identifiable structure
- Data is stored in the form of rows and columns
Example : Database - Data is well organized so, Definition, Format and Meaning of data is explicitly known
- Data resides in fixed fields within a record or file
- Similar entities are grouped together to form relations or classes
- Entities in the same group have same attributes
- Easy to access and query, So data can be easily used by other programs
- Data elements are addressable, so efficient to analyze and process
Unstructured Data
Any data with unknown form or the structure is classified as unstructured data. Unstructured data is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. One of the most common types of unstructured data is text. Unstructured text is generated and collected in a wide range of forms, including Word documents, email messages, PowerPoint presentations, survey responses, transcripts of call center interactions, and posts from blogs and social media sites. Other types of unstructured data include images, audio and video files.
Characteristics of Unstructured Data
- Data neither conforms to a data model nor has any structure.
- Data can not be stored in the form of rows and columns as in Databases
- Data does not follows any semantic or rules
- Data lacks any particular format or sequence
- Data has no easily identifiable structure
- Due to lack of identifiable structure, it cannot used by computer programs easily
Examples of Unstructured Data:
- Web pages
- Images (JPEG, GIF, PNG, etc.)
- Videos
- Memos
- Reports
- Word documents and PowerPoint presentations
- Surveys
Semi structured Data
Semi-structured data is the data which does not conforms to a data model but has some structure
Characteristics of semi-structured Data
- Data does not conform to a data model but has some structure.
- Data can not be stored in the form of rows and columns as in Databases
- Semi-structured data contains tags and elements (Metadata) which is used to group data and describe how the data is stored
- Similar entities are grouped together and organised in a hierarchy
- Entities in the same group may or may not have the same attributes or properties
- Does not contains sufficient metadata which makes automation and management of data difficult
- Size and type of the same attributes in a group may differ
- Due to lack of a well defined structure, it can not used by computer programs easily
Examples of Semi – structured Data
- E-mails
- XML and other markup languages
- TCP/IP packets
- Zipped files
- Integration of data from different sources
- Web pages
Dr. Preyal Sanghavi
Associate Professor
RBIMS