Title
Counting the Number of Occurrences of Each Word in a Text File - an Experimental Comparison of Implementations
Course
Data Structures
Abstract
There are many different ways to count the number of distinct occurrences of each word in a text file. In this assignment students are asked to implement and compare the efficiency of several of these methods, including self-adjusting lists, dictionaries (TreeMap class), and sorting.
The key feature of this assignment is that each student is expected to generate a report detailing and interpreting results of experiments that compare the runtime of different implementations. The result is a professional quality report with a description of the experimental design and tables/charts.
Students are expected to know how to implement doubly-linked lists and navigate the Java API. They are also expected to know complexity (big-oh notation).
The key feature of this assignment is that each student is expected to generate a report detailing and interpreting results of experiments that compare the runtime of different implementations. The result is a professional quality report with a description of the experimental design and tables/charts.
Students are expected to know how to implement doubly-linked lists and navigate the Java API. They are also expected to know complexity (big-oh notation).
Author
Matt Stallmann
Genre
coding,
reading design description,
experimental design,
gathering experimental data,
charting,
interpreting data,
writing a report
reading design description,
experimental design,
gathering experimental data,
charting,
interpreting data,
writing a report
Assignment Duration
Two Weeks
Communication Skill
reading, writing
Technical Skill
linear data structures,
program design,
object oriented language features,
standard library integration,
big-oh analysis
program design,
object oriented language features,
standard library integration,
big-oh analysis
Workplace Scenario
When analyzing a piece of text, it is sometimes useful to count the number of times each word appears and to identify the words that occur most often. One might, for example, process Twitter traffic or text messages among a specific group of people and, after filtering out words that are common in all English text (a, an, the, ...), figure out what the primary subject of the conversation is. In the workplace a developer may be asked to explore a variety of implementations of a frequently used system utility and write a report describing the advantages and disadvantages of each, with emphasis on efficiency.
Additional Information
This assignment can be used even if the students are not familiar with binary search tree implementations, as long as they can figure out how to use the TreeMap API.
The assignment can be adapted to C++, which also has a map class in the STL.
The assignment can be adapted to C++, which also has a map class in the STL.
Files
Collection
Citation
Matt Stallmann, “Counting the Number of Occurrences of Each Word in a Text File - an Experimental Comparison of Implementations,” Incorporating Communication Outcomes into the Computer Science Curriculum, accessed May 18, 2020, http://cs-comm.lib.muohio.edu/items/show/55.
Comments