Share
Explore BrainMass

Data Clustering

Example of how data clustering promotes clarity, from a PhD thesis by Matthias Scholtz

 

In many applications, data arrives in a constant stream, such as telephone records, multimedia data and financial transactions. In data streaming algorithms, the goal is to use a sequence of points to construct an accurate clustering of the given stream while being efficient with memory and time usage. Common components, version and heuristics for these algorithms include:

  • CURE - especially good for non-uniform clusters and outliers by choosing a middle ground and shrinking scattered points toward it
  • BIRCH - incrementally clusters incoming points by constructing a hierarchical data structure that minimizes input and output required
  • STREAM - needs only a small space to achieve constant-factor approximation of the k-median problem in one pass
  • COBWEB - incrementally clusters using a classification tree as its hierarchical clustering model 
  • C2ICM - selects objects as cluster seeds and assigns non-seed objects to the seed with the highest coverage to construct a flat partitioning cluster structure

To clarify the notation, k is an integer here where the input of the stream clustering is a sequence of points in metric space and k. The resulting output is called the K centers of that set of points where the sum of the distances from the data points themselves and the centers of the clusters is minimized. This notation gives popular cluster-analysis techniques like k-means and k-medoids clustering their names.

Clustering is important to do as it helps develop models and patterns from masses of seemingly patternless data. With the rise of data mining, the field has blossomed with research.

 

Icon credit EEPROM Eagle

Unix and Sleuth

This project needs to be done on a UNIX machine using the Sleuth forensic tools. If you are using your own machine, you need to install the Sleuth Kit forensic tools (http://www.sleuthkit.org) on your machine. This week, you need to use the Sleuth tools to carry out the following tasks on the FAT undelete image from http://d

Database Constraints

Business requirements are enforced by implementing database constraints on tables and columns. The database constraints available include the following. PRIMARY KEY FOREIGN KEY or REFERENTIAL INTEGRITY NOT NULL UNIQUE CHECK Give a business requirement and the constraint that could be implemented to enforce it. Ex

Computer Networks

Explain the benefits of network segmentation. Describe the different transport mechanisms included with TCP/IP. Explain each mechanism's approach for connections establishment and termination. Enumerate the applications that use TCP, the ones that use UDP, and the reasons why they use one or the other.

Text Editor: Create a File on a Newly Formatted Floppy Disk

1. Using a text editor, create a file that is between 5,000 and 6,000 bytes long on a newly formatted floppy disk. Calculate the file directory and FAT entries for the type of disk used, and check your calculations using absolute sector program sector.asm. 2. Write a program that will perform the DOS DIR command for a disk

Web Design Standards

Search the Internet for two Web sites relating to Web design standards. Complete the following in your discussion cluster: Create a list of 10 Web design standards with your cluster. To create this list, discuss the standards on each site and determine the 10 your cluster feels to be the most appropriate for effective Web d

SQL statements and databases

This must be done in SQL Server 2005. In the first exercise, the Class field in the Part table should be a string of size 5, and not an int. 1- Write a statement that creates a table named Part, with an Id field as an int idendity primary key(PK), a SupplierId int field, a Description string field of size 25, a Count int f

Database Usage Memorandum

Please help me so I can complete the following: I have to prepare a two to three page memorandum (350 words per page) analyzing the use of databases in my organization. Include what database software are used (Microsoft, Informix, Oracle, etc.). Conclude by proposing improvements. For large organizations, restrict the scop

Server-Side Scripting Languages

Need assitance with the attached problem. There are several server- based scripting languages available offering wide array of features and complexity to web engineers. Briefly review the emergence of such languages and recommend any two as your development tool. Does Perl offer any unique feature compared to JavaScript and P

Networking

Please help in answering the 2 questions below. Thank you. 1. Network Media XYZ Corp. is planning a new network. Engineers in the design shop must be connected to the accountants and salespeople in the front office, but all routes between the two areas must traverse the shop floor, where arc welders and metal-stamping equipm

Fault Tolerance and Backups

What is the difference between fault tolerance and disaster recovery? How does a network administrator decide which backup method to implement?

Ebay database does not have referential integrity.

Though there is great risk in implementing a database in this fashion Ebay gained an extreme performance boost because the database didn't have to work as hard to ensure that the data "conformed." So my question is, if the database does not have referential integrity to keep the data clean, how does Ebay ensure that the data

Data Mining

Differentiate between the following terms: A. Independent data mart and dependant data mart B. Fact table and dimension table C. OLTP and OLAP Chapter 7 1. Differentiate between the following terms: A. Validation data and test set data B. Positive correlation and negative correlation C. Control group and experi

Oracle9i

1. You want to make a report of table attributes. This report consists of a series of queries on data dictionary views in which you specify the table name, and the queries return details about the table. Your goal is to have information on the report that is similar (in content, not format) to the informaiton you see when you

Information Security (6 multiple choice) questions

Is there anyone that can help me be better understand the 6 multiple choice questions that are attached. Please help if you can. I feel that 5 credits for these questions is a very reasonable compensation for the review. There are 6 multiple choice questions which I have answered. I am requesting someone with knowledge i

Exporting and Importing Data

Discuss different methods of exporting and importing, with an emphasis on efficiency and avoiding data corruption or misplacement.

SQL Server 2000 Databases Management

1. What is the difference between complete and differential backups? 2. Explain the meaning of each of the transaction levels supported by SQL Server. 3. Explain the difference among the simple, full, and bulk-logged recovery models. 4. What is the difference between clustered and nonclustered indexes?

Use the Internet or computer magazines to investigate one of the following DBMSs

Use the Internet or computer magazines to investigate one of the following DBMSs: DB2, SQL Server, MySQL, Oracle, or Sybase. Then prepare a report that explains how the DBMS handles tow of the following distributed database functions: deadlock, fragmentation, replication, the data dictionary or log, and distributed queries.

SQL

I am seeking help with solving several SQL statements. I am specifically looking for the code that goes with these problems. 6) A wide world importers company tracks its order information in a database that includes two tables: Order and LineItem. See table structures below: CREATE TABLE dbo.Order ( OrderID int NOT NULL,

Computer Human Interaction

Go out to (http://www.open-video.org) and find the video clip about digital jewelry by typing chi in the search field( it should be on page 4). Write up a brief (3-5 paragraph) summary of what the video clip is demonstrating or what problem it is trying to solve. Be sure to identify the target audience and discuss whether the s