Information Discovery from Semi-structured Record Sets on the Web (Paperback)

,
In this book, we develop two frameworks to tackle the task of semi-structured Web data record extraction. We first present a record segmentation search tree framework in which a new search structure, named Record Segmentation Tree (RST), is designed and several efficient search pruning strategies on the RST structure are proposed to identify the records in a given Web page. We also present another DOM Structure Knowledge Oriented Global Analysis (Skoga) framework which can perform robust detection of different kinds of data records and record regions. Skoga can conduct a global analysis on the DOM structure to achieve effective detection. Finally, we present a framework that can make use of the detected data records to automatically populate existing Wikipedia categories. This framework takes a few existing entities that are automatically collected from a particular Wikipedia category as seed input and explores their attribute infoboxes to obtain clues for the discovery of more entities for this category and the attribute content of the newly discovered entities.

R1,465

Or split into 4x interest-free payments of 25% on orders over R50
Learn more

Discovery Miles14650
Mobicred@R137pm x 12* Mobicred Info
Free Delivery
Delivery AdviceShips in 10 - 15 working days


Toggle WishListAdd to wish list
Review this Item

Product Description

In this book, we develop two frameworks to tackle the task of semi-structured Web data record extraction. We first present a record segmentation search tree framework in which a new search structure, named Record Segmentation Tree (RST), is designed and several efficient search pruning strategies on the RST structure are proposed to identify the records in a given Web page. We also present another DOM Structure Knowledge Oriented Global Analysis (Skoga) framework which can perform robust detection of different kinds of data records and record regions. Skoga can conduct a global analysis on the DOM structure to achieve effective detection. Finally, we present a framework that can make use of the detected data records to automatically populate existing Wikipedia categories. This framework takes a few existing entities that are automatically collected from a particular Wikipedia category as seed input and explores their attribute infoboxes to obtain clues for the discovery of more entities for this category and the attribute content of the newly discovered entities.

Customer Reviews

No reviews or ratings yet - be the first to create one!

Product Details

General

Imprint

Lap Lambert Academic Publishing

Country of origin

United States

Release date

February 2014

Availability

Expected to ship within 10 - 15 working days

First published

February 2014

Authors

,

Dimensions

229 x 152 x 7mm (L x W x T)

Format

Paperback - Trade

Pages

124

ISBN-13

978-3-659-20611-5

Barcode

9783659206115

Categories

LSN

3-659-20611-3



Trending On Loot