Manfred
Kuechler, Hunter College (CUNY)
Version: April 2000
The Web as a Research Tool
in the Social Sciences
-- Outline --
Scope of the presentation
-
Focus on the WWW proper, not
-
just any use of "computers" in social science research though computers
are useful for the storage, management, and analysis of large amounts of
any data both numerical and non-numerical (textual) hence any research
and scholarly writing -- even in social philosophy -- will benefit from
using "computers"
-
just any use of an Internet service (e-mail, ftp, telnet, news groups,
bulletin boards, chat rooms, videoconferencing, etc.) though the lines
are becoming increasing blurred with the "Web" providing convenient user
interfaces
-
Differentiation between using the Web to access
-
information about research and
-
information as input for the actual process of discovery or data collection
-
Focus on empirically grounded (but not necessarily quantitative)
social
research
-
Mixture of
-
links to sites that are generally useful starting points to devise
an effective use of the Web for any specific research project and
-
prototypical examples drawn from my own research
The four major benefits
Textual data
Such documents are important primary sources for a wide range of research
topics, especially topics in the area of social and political
change, public policy, institutional and organizational analysis,
etc.
-
Government/legislative documents
-
Legal documents (court decisions)
-
Historical documents (culture & history)
-
Documents produced by (non-governmental) organizations and institutions
-
Newspapers and Radio/TV online editions
Sample projects
A. INTERNATIONAL
INTERDISCIPLINARY STUDY OF NATION BUILDING FOR KOREAN UNIFICATION
-
Monitoring of political, economic, and social developments by daily check
on (English language) Korean papers and news services, e.g.,
-
Retrieval of government documents, e.g.,
B. GERMAN ELECTIONS 1998
-
Monitoring news of the major television
network (video clips)
-
Checking web sites of parties and retrieving campaign information in advance
of a fact finding trip (including personal interviews with political leaders
from all major parties) to Germany and monitoring the post-election process
of coalition negotiations and government formation, e.g., SPD,
CDU,
FDP,
Greens,
PDS,
DVU,
Reps.
The next two items are examples for quantitative data discussed in the
next section:
-
Retrieving detailed information on final returns from the official
site ("Bundeswahlleiter")
-
Monitoring and retrieving publications on public opinion polling
Sources for Quantitative Data
These include census data and other official statistics as well as public
opinion (survey) data. They may be available as full data sets (in various
formats) or as tables; they may be embedded in a narrative or as just a
set of numbers. The ultimate
site for keeping track of Web sources supplying quantitative data
is maintained at the University of California at San Diego.
Sample sites for (mostly) tabular data:
Sample sites for data sets/question banks:
A bit of both and some from the next section:
-
Data Ferret (Federal
Electronic
Research
Review
and Extraction Tool) from the US Census Bureau and the BLS
Remote statistics
This is a logical extension of traditional data archives on the one hand
and data in pre-arranged tables (whether in old-fashioned printed volumes
or delivered via the Web) on the other. The user conveys his/her need for
a specific statistical analysis (be it a cross-tabulation or a bar chart)
via convenient Web forms to a remote server. There, this request is (automatically)
translated into the language of a statistical analysis program (like SAS
or SPSS) and the job is run. The results (output) are send to the requester
via the Web.
This allows for great flexibility as to exactly what statistical analysis
is needed and it frees the user from having access to or even knowing how
to run statistical software. In addition, there are several sites that
provide statistical maps on demands.
Literature searches
This have been made much easier, much faster, and much more efficient through
the use of the Web in at least four ways:
-
more convenient access/interface to traditional computerized catalogs,
e.g.,
-
commercial super bookstores on the Internet, e.g.,
-
innovative search and delivery services (download of full text) of traditional
journals, e.g.,
"Licensed resources" can only be accessed via a CUNY (Hunter)
IP address, i.e., from an on campus computer or via a proxy
server.
-
online (refereed) journals, e.g.,
Usage Problems
-
Searching/Monitoring
-
Picking the right search engine or directory
-
Site monitoring
Help/Advice:
Many of the links above lead to directories of one kind or the other.
However, at times, these links may not provide a suitable starting point.
Then, a general "search engine" should be considered. There a great many
of such search engine and lately the distinction between "search engines"
and "directories" has become rather fuzzy in that most site are now a bit
of both.
For a long time, my personal favorite has been Altavista.
But lately, I have had very good experience with Google
-- a recent addition to the field of search engines -- finding relevant
sites quickly. Whatever your preference, it pays to familiarize yourself
with all the (advanced) features of the search engine you use, to watch
for improvements and additions, and to consider parallel searches with
another engine if the results are less than fully satisfactory.
Once you have found a useful site, there may be a need for monitoring
this site over a period of time, as the contents of many web pages changes
frequently. A (currently free) service called Mind-it
keeps an eye the web sites you are interested in and alerts you via e-mail
if a change has occurred.
-
Validation
-
Evaluation of the source
-
Integrity of the document (in particular, reposts on non-originator sites)
Help/Advice:
Jan Alexander and Marsha Tate at Widener University provide excellent
advice including a number of check lists. However, as always such check
list are best used with some discretion, as a set of suggestions, not as
a list of instructions to be followed schematically.
-
Citation and Documentation
-
Changing site structures and URLs
-
Changing contents (under constant URL)
-
Temporary pages (including active server pages -- .asp)
-
Lack of specificity (no page numbers)
Help/Advice:
These are particularly nagging problems as many sites undergo frequent
restructuring and once working "URLs" become invalid. In addition, a number
of large (governmental) site require the use of internal searches that
lead to "temporary" URLs (that will become non-working after a short period
of time; sometimes a few hours, sometimes a day or two). Some of these
sites (e.g., Thomas,
Government
Printing Office) provide detailed instructions how to create permanent
("persistent") URLs, but an extra step is required.
Another problem is the difficulty to point to a specific passage within
a longer document. Some solutions have been developed, but there is no
generally agreed upon standard yet. The approach used by the White
House (go to section G on the White House web page) involves PDI ("persistent
document identifiers") which allows to established permanent and quotation-specific
references (URLs) to all documents on the White House site.
Many sites offer documents in pdf format which provide a solution to
making specific references. Unlike ordinary web documents (in html format)
where the page layout is contingent upon a user's screen and/or printer
settings, pdf documents have a fixed page layout and allow meaningful page
references as old-fashioned hard copy does.
-
Access
-
Server outages and slow Internet traffic
-
Fees
Help/Advice:
Apart from using adequate equipment on your end (see below), there
is not much you can do with respect to the first problem. However, you
may want to pinpoint the location of the bottleneck. With Win9x, type "tracert
(IP address of the remote site)" at a DOS prompt, e.g.: tracert www.science.widener.edu.
This will give you information on where the bottleneck is. Maybe you can
get to sites on the East coast without delay, but traffic to the West is
clogged up.
As to fees, make use of free access to licensed
resources offered by CUNY and Hunter. E.g., you can get an article
from the NYT for free via Lexis-Nexis instead of paying for it at the NYT
site (where only same day and a specific subset of articles are free).
Technical prerequisites
While it is possible to run Win3.x and a web browser like Netscape on an
old 386 machine with as little as 8 MB memory (it is even possible to browse
the Web on a 286 under DOS using a browser like lynx), any serious use
of the Web for research requires a bit more in hardware and some effort
in installing efficient browsing software. The following describes an efficient
setup -- though not everything is absolutely necessary and, of course,
it is possible to use a Mac instead of a system running under MS
Windows.
-
Pentium processor with 32-64 MB memory (the more memory the better; this
is more important than 'clock speed')
-
Video card with 4-8 MB memory
-
Sound card and speakers or headphones
-
Win9x
-
56K modem connection (for off campus; advice
on selecting an ISP)
-
Current version of Netscape or MS Internet Explorer (at least version 4.0)
and proper customization including
-
sufficiently sized memory and disk caches
-
enabled support for java and javascript
-
Installation of readers and plugins for commonly used formats on the Web
(can all be downloaded for free) including
-
Acrobat reader for pdf (=portable data format) documents
-
Microsoft Media Player (audio and video files)
-
RealPlayer (streaming audio and video)
Web sites requiring special readers/plugins typically include a link to
a download site and instructions. Additional readers/plugins should be
installed as needed. However, it is important to use current versions for
optimal performance.
Possible additions:
Conclusion
I see three major general effects of using the Web for research:
-
increase in solid, empirically grounded research
-
increase in research combining qualitative and quantitative data
-
increase in opportunities for underfunded researchers including graduate
students
And as a consequence, we may experience a shift in emphasis on specific
tools. In particular, methods to manage and analyze large quantities
of qualitative (textual) data -- computer-aided content analysis -- may
see a revival and/or a renewed push for further development. A comprehensive
guide to software and publications related to content analysis is maintained
at Georgia State University. There is also a specific e-mail list for the
discussion of topics related to content analysis called CONTENT (subscribe).
There are similar sites in Great Britain and Germany: Computer Assisted
Qualitative Data Analysis Software (CAQDAS)
and a page maintained by Harald Klein including a very useful software
overview, now even in an English
version.
This page has been accessed
times
since Dec 1, 1999