Jan 11, 2011 - â¢News & Blogs. â¢Maps & Locations. â¢Games & ..... 5. Follow-up searches ... by the subje
Internet Search and Analysis in Intelligence and Investigations Tuesday, January 11, 2011 7:30 AM – 8:45 AM
Ed Appel Proprietor, iNameCheck
1
Presentation • • • •
Quick Internet Overview Online Sources & Methods Legal, Policy & Privacy Guidelines Policy & Regulatory Issues
2
The Internet Is Essential for Investigations and Intelligence • • • • • • • •
Accessible data Who’s online: 80% 30%+ power users Crime & misbehavior Due diligence Intelligence Vetting Investigations Pew found all age groups online in significant percentages.
3
Internet Growth Millions of Users Millions of Users
The numbers, however precise, show that Internet growth is rapid
Source: InternetWorldStats
IP traffic in Petabytes/month IP traffic in Petabytes/month
US: #2 in world (after China) – 239,893,600 users of 310.2M population – 77.3%, per InternetWorldStats.com Source: Cisco
4
The Internet Universe
MIT Internet Map 2007
November 3, 2003 Map of the Internet An increasingly complex, interconnected galaxy of nodes is portrayed in these Internet maps by leading technologists. 5
San Diego Supercomputer Center I-‐Map 2008
San Diego Supercomputer Study of Internet Links
A billion or more people use the Internet daily, according to recent studies by SDSC research.
The map of the Internet, as built and described in a Nature Communications paper, shows the locations of Internet systems on the hyperbolic plane. Image courtesy of Dmitri Krioukov, SDSC/CAIDA
6
What’s on the Internet? ¾Social Networking ¾News & Blogs ¾Maps & Locations ¾Games & Hobbies ¾Photos ¾Video, Film, Music ¾Libraries ¾ E-Commerce ¾ Advertising ¾ Private Websites
¾Porn, Exploitation ¾Illegal Sites ¾Illegal Activities ¾Illicit Activities ¾Forbidden Activities ¾Fantasy ¾Humor ¾Juvenile Delinquency Wireless: Major Growth Area 7
What’s on the Internet? • Public Records (Real Estate, Courts, Licenses, Businesses, Arrests, Liens, etc.) • Residences, Building Occupants • Telephones, Email, Mailing Addresses • Genealogy, Births, Deaths • Educational Institutions & Alumni • Business & Executive Profiles • Associations & Volunteer Organizations • Private data vendors (Acurint, IRB, TLO) 8
Self-‐Descriptions in Online Profiles
Yedo Da Meth Lover, 26, Colbert, Washington MySpace Lowlife, 26, Brownsville/Austin, TX, “Death to the New World Order” MySpace
Crack Monkey, 21, Somerset, NJ, Rider grad, MySpace
Hacker Club Facebook Lynn, N. Seattle ecstasy dealer MySpace
Angela, meth addict MySpace 9
Illicit Behavior Online: People We Trusted Florida Asst. US Attorney arrested in 2007 as he arrived in Detroit with doll, earrings, Vaseline, for trying to arrange to have sex with 5-‐year-‐old in Internet chats. He committed suicide in his cell in 2007.
A DHS press spokesman caught trying to induce “14-‐year-‐old girl” (an undercover detective) to have sex, pled No Contest in 2006 Army Chief Warrant Officer, Director of Army School of Information Technology, arrested in 2010 for collecting and sharing child pornography over the Internet US military contractor in Baghdad hacked girls’ computers, extorted them for nude photos & sex tapes, tried to meet some for sex while on leave, had over 4,000 victims when arrested. Serving a 30-‐year sentence, 2010. 10
Case Examples A computer forensic analyst – part of the IT security department of a Fortune 500 firm – was found publicizing himself online as a profane, offensive “leader” of 5,000 players in a worldwide, popular massively multiplayer online fantasy sci-fi game – which led to discovery of his game playing all day, during both work and off-hours. A new chief of research was found to have been disciplined by the FDA – 3 years prohibition from government contracting – for admitted scientific misconduct. While the FDA database did not show the 10-year-old sanctions, three FDA newsletters online reported them. One lesson: What you don’t know about what’s online can hurt you. 11
Case Examples ~1,000 US Navy personnel using their Navy.mil email addresses as their MySpace user names. Many postings contain unsuitable material, including operational security issues.
A computer security man who pled guilty to operating a massive botnet that stole IDs and money was hired by a Santa Monica Internet search firm while he was awaiting sentencing. The firm failed to Google the convict.
12
Spc. Bradley Manning Accused in Wikileaks Case
Bradley Manning was reportedly despondent over losing a lover and disciplined for striking a soldier
“Wikileaks” chief suspect Spc. Bradley Manning, 22, of Potomac, MD, was arrested in Kuwait and incarcerated at Quantico Marine Base, charged in July 2010 with leaking classified videos of US air strikes in Iraq to the Wikileaks website in April 2010. An online chat acquaintance, Adrian Lano (formerly convicted of computer hacking) told authorities and the press that Manning provided thousands of classified documents to Wikileaks. Julian Assange, Wikileaks’ founder, claimed the leaker exposed US military misdeeds. US government leaders voiced fear that US troops and informants would be killed based on secrets leaked, and defended the actions depicted. 75 MB of classified documents posted by Wikileaks numbered in the thousands.
Julian Assange, Wikileaks
Adrian Lamo ~2001
Leaked videos included US air strikes that killed civilians, including a Reuters reporter & driver
Manning’s charges include illegally transferring classified data to his PC, placing unauthorized software on military computers and delivering national defense info to an unauthorized party 13
Internet Searching is Useful For: • • • • • • •
Cyber vetting – virtual neighborhoods Criminal & corporate investigations IP & asset protection (insider threat) Compliance Competitive intelligence Legal support Research (any topic)
14
Likely Findings • History of malicious online activities: ~3-‐6% • Derogatory information, e.g. past bad acts – Arrests, convictions, lawsuits, bankruptcies, firing
• Misuse of “anonymous” virtual identity online • Most likely: Verification of qualifications and eligibility for the position sought in vetting
15
Sources & Methods for Internet Searching • • • •
Systems & Tools Search Engines & Metasearch Websites with Databases: “Dark Web” Automated Searching
Analysis is critical for the information to have value
16
Systems • Search on the right computer – Use a separate system for searching -‐malware risk – Keep anti-‐virus, firewall, anti-‐malware up to date
• Protect your anonymity – you can be detected • Protect the subject – don’t leave a trail • Use fast systems, applications, enough memory
17
Applications • Browser: Internet Explorer, Firefox, Chrome, Safari, Opera • Browser settings, search engine integration • PDF printer (e.g. Adobe Acrobat) • Database or folders – retrievable files • Search tools (internal, Internet)
18
Manual Searches • Big 5 Search Engines – Live & Cached Results – Google (YouTube) – Page Rank: 100 factors – Yahoo! 4B pages – Microsoft (Bing) – Ask (MyWebSearch) 3% of searches – AOL (MapQuest)
• Popular (Social & Sales) websites – eBay, Facebook, MySpace, Craigslist, Amazon
19
Other Search Engines All the Web -‐ "live search" looks for terms as you type them AltaVista -‐ A Yahoo property that's not what it used to be Exalead -‐ Search engine from France FreeSearch -‐ U.K. search engine Gigablast -‐ Looks similar to Google, smaller database IceRocket Lycos Mamma (really a metasearch engine) Openfind -‐ Emphasizes Chinese-‐language results WiseNut -‐ Includes "Wise Guides," (topic groups ) Contemporary (“Web 2.0”) Search Tools Twitter.com , Trackle.com, Monitter.com and Friendfeed.com – help find people & provide “right now” results 20
Specialized Searching (Examples) • Blogs: blogsearch.google.com, icerocket,com, sphere.com, technorati.com, blogdigger.com • IP addresses: SamSpade.org, whois.com, networksolutions.com, domaintools.com • Reverse phone/address: Whitepages.com, anywho.com, verizon.com • Public records: brbpub.com (county) • Government: usa.gov 21
More Searches • Advanced search (Boolean logic) • Special features: images, videos, maps, news, blogs • Country-‐based searching • Translations (rough) • Tracking: Google.com/alerts (emails)
22
Tracking • Google and other tools (Trackle.com) allow one to track: – Changes in websites – Appearance of terms on indexed pages – Appearance of terms in Twitter & other places – Blogs & news references to a term
• Tracking is important in protection of assets and following activities of rivals & adversaries 23
Leveraging Search Engine Findings • Identify websites that may hold more on topic – Colleges, associations, groups, social sites – Local press, hobbies, sports, high schools
• Identify subject’s activities that may lead to further searching • Identify subject’s family and closest friends, who may post about the subject
24
Metasearch Engines Dogpile
http://www.dogpile.com/
Google, Yahoo, Bing, Ask
ixquick
http://www.ixquick.com/
11 sites
Metasearch
http://www.metasearchengine.com/
27 sites
Excite
http://www.excite.com/
Google, Yahoo, Bing, Ask
Infospace
http://www.infospace.com/
Google, Yahoo, Bing, Ask, Twitter
Addictomatic
http://addictomatic.com/
Metasearch engine (23 sites)
Metacrawler
http://www.metacrawler.com/
9 or more sites
Search3
http://www.search3.com/
Google, Twitter, Bing, in columns
Notice that results differ in order & number
Cached Web Pages Archive.org: Website content no longer online (Wayback Machine) 25
Invisible Web
Internet
Many online databases are not accessible to Google
26
Variations in Name Searches: Examples • Use different versions of a name: – “John J. Doe” (full name in quotes) – “Jack Doe” (nickname in quotes) – “Jack Doe” Nevada (name in quotes + geographic location) – “Jack Doe” IBM (name in quotes + job/industry/hobby) – “Jack Doe” Purdue (name in quotes + school) • Address – reverse address – J. Doe may work better than John Doe • Phone Numbers • Email Addresses –
[email protected] – doe – jjdoe@ – @jacksbar (used with smaller companies) 27
Quick Anatomy of Google • Google (YouTube) constantly spiders the Internet, hits pages about once every 30 days • Caches & indexes about 10 billion pages, more than any other search engine • Presents search results instantly, showing live and cached data links • Presents results in “PageRank” order based on popularity (note: ads influence results)
The Internet: ¾506M websites ¾56B pages Google has about 18% of pages indexed Web Google
28
Searching Online Databases: Contents May Not Be Indexed by Search Engines • PeopleFinders, zabasearch • WhitePages.com, Anywho.com • USA.gov • USTaxCourt.gov • BlogSearch.google, IceRocket, Sphere • Yahoo message boards
• • • • • •
Whois, SamSpade.org Nsopr.gov SSNValidator.com USAF-‐locator.com Bop.gov/inmate AMA-‐assn.org, bms.org (MDs) • RipoffReport.com • RagingBull.com 29
Finding Search Tools • Library of Congress: http://www.loc.gov/rr/ElectronicResources/subjects. php?subjectID=69 • List of Search Engines: http://www.pandia.com/powersearch • Yahoo List: http://dir.yahoo.com/Computers_and_Internet/Inter net/World_Wide_Web/Searching_the_Web/Search_ Engines_and_Directories/
30
Search Automation • • • •
Metasearch Copernic: www.copernic.com Corporate datamining tools Proprietary Software Better COTS products are needed
Boolean Logic, Search Techniques Optimize Queries
31
Step-‐by-‐Step Approach 1. Search engines Individual (e.g. Google, Yahoo) Meta (DogPile, Metasearchengine)
2. Social Networks/Blog sites 3. Copernic 4. Automated searches 5. Follow-up searches 32
Keeping Up With The Internet • • • • • •
Keep a spreadsheet with links to best sources Don’t rely on search engines alone Find new sites & drop those no longer useful Research what works best Use experts in Internet searching -‐ outsource Train & equip internal Internet searchers
33
Procedures • • • • • • •
Plan – include subject-‐specific sites & terms Capture content, print into PDFs Include details (URLs, dates, specifics) Provide source for each item reported Log the process, if evidence results Do not include inappropriate data (Title VII) Include caveats about reliability in reports 34
Controversial Methods • • • •
“Friending” subjects – in real or false identity Social engineering to elicit info about subject Emailing subject under a false identity “Pretexting” as the subject to elicit data from a company or someone who knows subject • Identifying an anonymous emailer using hidden code • “Lurking” in chat rooms 35
Large Scale Internet Intelligence • • • •
Use automated search tools Capture & store on-‐line activities for reference Filter and scan results to find relevant data Analyze and report results along with other investigative sources • Identify users: link real names to online IDs • Be careful in using Internet data to ensure accuracy and fairness 36
Analyzing Search Results • Attribution: Who uses a virtual identity, posts • Verification: Proving or confirming online data – Ultimate confirmation: admission of subject
• Filter non-‐identifiable, irrelevant references • Evaluating the seriousness of findings • How much searching is enough?
37
Preserving Online Evidence If you are not using computer forensic tools….
• Print relevant web pages (PDF files) • Maintain securely (encryption, digital signatures) • Keep long enough to meet legal obligations (then delete completely) If the content can become evidence, keep a log and notes to support testimony about collection. 38
Using Search Results • Integrate into other reporting – with clear indication of source • Remember: subject may not have posted item • Fairness may demand verification of the data by the subject • In vetting, it’s best to interview the subject about any questionable postings
39
Is Internet Vetting Legal? Is Internet Information “Private?” • Internet data is public, not private: plain view, published information • No restriction on using published information • Must abide by all legal requirements for other types of investigative information • No current legal requirements for – Advising the subject – Using Internet searching, if not outsourced
Caveat: This does not constitute legal advice 40
Legal & Privacy Gold Standard ͻ Notice, consent: add to current forms ͻ Attribution, verification, subject interview, redress ͻ Assessing results as intelligence: – Virtual ID might be used by someone else – Online data may be fabricated, fantasy, altered – Basis for subject interview, adjudication
ͻ Meets FCRA & other legal requirements 41
Cyber Vetting Guidelines • IACP-‐PERSEREC Project: Guidelines – Cyber Vetting for Law Enforcement – Cyber Vetting for National Security – Cyber Posting for both above
• Nationwide series of focus groups, research • Baseline considerations for establishing enterprise policies and procedures PERSEREC: Defense Personnel Security Research Center, Monterey, CA IACP: International Association of Chiefs of Police 42
IACP Cyber Vetting Guidelines
Developing a Cybervetting Strategy for Law Enforcement, December 2010, IACP [Companion study for national security] http://www.iacpsocialmedia.org/Portals/1/documen ts/CybervettingReport.pdf
43
Key Policy Issues • Trained Internet investigators • Outsourced (can address EEO issue) • Internet search policies & procedures – Liability if Internet searching is done improperly
• Defining sufficiency -‐ completeness • Utilizing results of searching
44
Issues with Private Investigators • Licensing of cyber investigators – Training
• Legal and ethical guidelines for cyber vetting • Watching the watchers: regulators online • Keeping up with the Internet
45
Forthcoming Book:
Internet Searches for Vetting, Investigations and Open-‐Source Intelligence By Edward J. Appel Taylor & Francis http://www.taylorandfrancis.com/books/details/9781439827512/ Scheduled publication January 14, 2011
…contains more details on topics discussed here, e.g. how to do cybervetting and investigations ethically & legally 46
Questions?
Contact Information: Ed Appel, Proprietor, iNameCheck (301) 524-‐8074
[email protected] www.inamecheck.com 47