Supervisors: Iadh Ounis (Dept of Computing Science) and Andrzej Huczynski (Dept of Business and Management)
Roughly speaking, a computer-based essay marker is a program that, given some text of a given language, can return feedback to the author explaining how the text can be improved. A computer-based essay marking system might be helpful for the students to practice and to get a very quick feedback. It could also allow the instructors/lecturers to gather valuable feedback in the form of statistical data about how well the students are doing. The aim of this project is to build a computer-based essay marker that will integrate the Strathclyde University's Web sites, in order to be used by large classes of students.
This project is quite challenging as it brings together several computing sciences disciplines, including information retrieval (for the marking and feedback functionalities), information management systems (for the storage and statistical functionalities), interactive systems (for ergonomic and user interaction functionalities) and Web technologies (for use on the WWW).
Last year, a group of students constructed some software in Java, which provides a basic computer-based essay marker. The approach taken was to hold all the information necessary to mark the essays and provide the feedback in an Oracle database. The software will calculate the grade when given an essay, will also provide a feedback and finally will update the database with the relevant information.
This work needs to be taken on in a number of ways:
Supervisor: Iadh Ounis
There are many search engines on the Web (e.g. Altavista, Google, Lycos, etc.). However, their qualitative performance is unsatisfactory, and their coverage of the Web (proportion of the Web indexed/collected by these search engines) is poor. One solution to this problem is to use a MetaSearch engine. A metasearch engine is a Web server that sends a given query to several search engines, collect their answers and present them to the user. This allows to have a better coverage of the Web, while having a simplified interaction.
The aim of this project is to develop something similar to Vivisimo, but the results should be provided to the users using information visualisation techniques (rather than the traditional sequential list of relevant documents). The final system should include the following functionalities:
Supervisor: Iadh Ounis
Although the majority of Web content is in English, it also shows great promise as a source of multilingual content. Such multilingual data can be useful in Cross-Language Information Retrieval (CLIR) on the Web. In addition to the classical information retrieval tasks, CLIR also requires that the query (or the documents) be translated from a language to another. CLIR is currently becoming a hot topic in the Web community. Indeed, it is estimated that by 2005, 78% of Internet users will be non-English speakers. One of the objectives of a CLIR system is to remove the language barrier.
In this project, we will investigate a new translation approach, based on the use of parallel texts. Hence, one of the aims of this project is to automatically find parallel translated documents on the Web. One possible idea is to develop an intelligent agent to "mine" sites where bilingual text is known to be available. In fact, the Web is a great source of translation examples. In fact, many sites are bilingual, mostly English and another language. Automatically extracting parallel text from the Web is an interesting Web application. Translational equivalence between words could then be automatically detected on the basis of the obtained parallels documents and used in CLIR purposes.
The aim of this project is to develop a Cross-Language Information Retrieval system based on parallel texts. The final system will be built on top of the available SMART information retrieval system and should include the following specific functionalities:
Supervisor: Iadh Ounis
Note: Mainly Suitable for SE students
XML is a well established universal standard, which has been mainly used for the exchange of information between different applications and data repositories. However, if XML is fine for the original purpose for which its ancestor SGML was designed, i.e. specifying the formats of documents, it is currently no more than a kludgy language for all other aspects. Indeed, for specifying anything else than the formats of documents (e.g. semantics, mathematics, or anything else that has a rich set of operators) the syntax of LISP or Conceptual Graphs is vastly superior.
Conceptual Graphs (CGs) are a very popular/simple knowledge representation formalism developed by John Sowa in the 80s. A conceptual graph is a bipartite graph that has two kinds of nodes called concepts and conceptual relations. The nodes are linked by arcs. CGs have a great deal more to offer than XML: they have a methodology for building larger structures of contexts that can express natural language semantics, Petri nets, UML-like diagrams, and many other kind of information in a way that is (1) more readable for humans and (2) more efficient for certain kinds of graph-processing algorithms.
CGs have been developed as a conceptual schema language for information/knowledge interchange between IT systems that required a structured representation for logic. CGIF (Conceptual Graphs Interchange Format) is currently developed as an ISO standard for implementers of IT systems that use CGs as an internal representations or as an external representation for interchange with other IT systems. The external representations (graphical) are readable by humans and may also be used in communication between humans or between humans and machines.
The goal of this project is to develop an API specification designed to provide an implementation-independent interface for manipulating conceptual graphs- i.e. a set of tools written in Java that transmit CG graphs by using the CGIF standard file format and including:
A good starting point will be the excellent NOTIO free API developed at the University of Waterloo by Finnegan Southey and demonstrated last August at ICCS 2001.