Incorporating Communication Outcomes into the Computer Science Curriculum | Counting the Number of Occurrences of Each Word in a Text File - an Experimental Comparison of Implementations

Title

Counting the Number of Occurrences of Each Word in a Text File - an Experimental Comparison of Implementations

Course

Data Structures

Abstract

There are many different ways to count the number of distinct occurrences of each word in a text file. In this assignment students are asked to implement and compare the efficiency of several of these methods, including self-adjusting lists, dictionaries (TreeMap class), and sorting.

The key feature of this assignment is that each student is expected to generate a report detailing and interpreting results of experiments that compare the runtime of different implementations. The result is a professional quality report with a description of the experimental design and tables/charts.

Students are expected to know how to implement doubly-linked lists and navigate the Java API. They are also expected to know complexity (big-oh notation).

Author

Matt Stallmann

Genre

coding,
reading design description,
experimental design,
gathering experimental data,
charting,
interpreting data,
writing a report

Assignment Duration

Two Weeks

Communication Skill

reading, writing

Technical Skill

linear data structures,
program design,
object oriented language features,
standard library integration,
big-oh analysis

Workplace Scenario

When analyzing a piece of text, it is sometimes useful to count the number of times each word appears and to identify the words that occur most often. One might, for example, process Twitter traffic or text messages among a specific group of people and, after filtering out words that are common in all English text (a, an, the, ...), figure out what the primary subject of the conversation is. In the workplace a developer may be asked to explore a variety of implementations of a frequently used system utility and write a report describing the advantages and disadvantages of each, with emphasis on efficiency.

Additional Information

This assignment can be used even if the students are not familiar with binary search tree implementations, as long as they can figure out how to use the TreeMap API.

The assignment can be adapted to C++, which also has a map class in the STL.

Files

2010-8-project-1.html

Collection