Big Data: Types of data used in Analytics

Data types involved in Big Data analytics are many like Structured Data, Semi structured data and Unstructured Data

Structured Data

Any data that can be stored, accessed and processed in the form of fixed format is termed as a ‘structured’ data.

A Student table in a database is an example of Structured Data

Student_ID  Student Name  Gender  Department 
2365 Rajesh Kulkarni Male Finance
3398 Pratibha Joshi Female Marketing
7465 Shushil Roy Male HR
7500 Shubhojit Das Male Finance
7699 Priya Sane Female Finance

 

Characteristics of Structured Data:

  • Data conforms to a data model and has easily identifiable structure
  • Data is stored in the form of rows and columns
    Example : Database
  • Data is well organized so, Definition, Format and Meaning of data is explicitly known
  • Data resides in fixed fields within a record or file
  • Similar entities are grouped together to form relations or classes
  • Entities in the same group have same attributes
  • Easy to access and query, So data can be easily used by other programs
  • Data elements are addressable, so efficient to analyze and process

Unstructured Data

Any data with unknown form or the structure is classified as unstructured data. Unstructured data is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. One of the most common types of unstructured data is text. Unstructured text is generated and collected in a wide range of forms, including Word documents, email messages, PowerPoint presentations, survey responses, transcripts of call center interactions, and posts from blogs and social media sites. Other types of unstructured data include images, audio and video files.

Characteristics of Unstructured Data

  • Data neither conforms to a data model nor has any structure.
  • Data can not be stored in the form of rows and columns as in Databases
  • Data does not follows any semantic or rules
  • Data lacks any particular format or sequence
  • Data has no easily identifiable structure
  • Due to lack of identifiable structure, it cannot used by computer programs easily

Examples of Unstructured Data:

  • Web pages
  • Images (JPEG, GIF, PNG, etc.)
  • Videos
  • Memos
  • Reports
  • Word documents and PowerPoint presentations
  • Surveys

Semi structured Data

Semi-structured data is the data which does not conforms to a data model but has some structure

Characteristics of semi-structured Data

  • Data does not conform to a data model but has some structure.
  • Data can not be stored in the form of rows and columns as in Databases
  • Semi-structured data contains tags and elements (Metadata) which is used to group data and describe how the data is stored
  • Similar entities are grouped together and organised in a hierarchy
  • Entities in the same group may or may not have the same attributes or properties
  • Does not contains sufficient metadata which makes automation and management of data difficult
  • Size and type of the same attributes in a group may differ
  • Due to lack of a well defined structure, it can not used by computer programs easily

Examples of Semi – structured Data

  • E-mails
  • XML and other markup languages
  • TCP/IP packets
  • Zipped files
  • Integration of data from different sources
  • Web pages

Dr. Preyal Sanghavi

Associate Professor

RBIMS

Leave a comment

Your email address will not be published. Required fields are marked *