Democratic Underground Latest Greatest Lobby Journals Search Options Help Login
Google

Takin' a break from my thesis. Who wants to talk AI?

Printer-friendly format Printer-friendly format
Printer-friendly format Email this thread to a friend
Printer-friendly format Bookmark this thread
This topic is archived.
Home » Discuss » The DU Lounge Donate to DU
 
Fredda Weinberg Donating Member (1000+ posts) Send PM | Profile | Ignore Sun Mar-12-06 11:54 PM
Original message
Takin' a break from my thesis. Who wants to talk AI?
I've been looking at too many academic papers tonight and I'm tired of writing chapters. Ideas are best shared, so even if there's nobody here up to it, I'm going to express where I'm going and by the end, should have a clearer overview of the subject.

I started this exercise about six months ago and at the time, it seemed a good idea to give a computer insight into its own processes ... so I proposed a script called "meta-measure" that would write itself depending on the configuration of the machine. It would run two version of the same database query - one with a SQL object called a cursor and one without and for different sized recordsets, would conclude that one or the other was superior. I'd seen an article which claimed cursors were obsolete, but for smaller datasets, that wasn't my experience, so I thought it would make more sense to let a computer explore the boundaries, since it was, after all a mechanical task.

The graduate committee (this is for my master's in information science) rejected my first proposal - one comment was that I was simply offering to take "good notes", so I reworked it to incorporate what I thought would be useful features in AI - I was about to take classes in AI and Expert Systems and it seemed reasonable that something I would study would be helpful.

I was disappointed. Maybe it's our organic bigotry, but some of the objections to "artificial" intelligence were applicable to humans; Searle's Chinese Room, for example, where an imagined responded to written questions by matching pictographs to English phrases - did the operator "understand" Chinese?

And Simulated Annealing, which uses random numbers to find an optimal solution - turns out it's still less efficient than an informed heuristic and remains vulnerable to local minima ... settling on a good solution when a better one remains to be discovered. Damn!

So meta-measure (M&M for short) comes back to its original form, with a couple of ironic twists ... I'll be saving the script fragments in SQL Server itself and since most of the privileged commands I'll be executing can only be run from text files, M&M will be a kind of genetic-algorithm, forcing the end of runaway processes by rebooting after a pre-determined limit to my patience and taking up where it left off by parsing the files it left behind.

Now, is it just me, or does this sound simple, yet profound?
Printer Friendly | Permalink |  | Top
TheBaldyMan Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Mar-13-06 12:05 AM
Response to Original message
1. in my experience 'Artificial Stupidity' is a far better description
do I understand you: the text fragments are the feedback mechanism and you construct a new script from this?
Printer Friendly | Permalink |  | Top
 
Hissyspit Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Mar-13-06 12:14 AM
Response to Reply #1
2. Sounds like deconstructionism!
Which is both simple and profound and not-so-profound.

Actually everything is simple and profound at the same time.
Printer Friendly | Permalink |  | Top
 
Fredda Weinberg Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Mar-13-06 12:32 AM
Response to Reply #2
4. I like that - a philosophy that defies definition
That's what I was thinking as I was reviewing some essays from Hofstadter's latest, Fluid Concepts and Creative Analogies: Computer Models of the Fundamental Mechanisms of Thought. The idea that attracted me was David Chalmers' essay (http://consc.net/papers/highlevel.pdf) where he argues that models of reality have to be flexible to be useful - and AI's emphasis on static representations were fatally flawed.
Printer Friendly | Permalink |  | Top
 
Hissyspit Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Mar-13-06 12:56 AM
Response to Reply #4
7. I am glad you posted that. I've ordered a copy already. n/t
Printer Friendly | Permalink |  | Top
 
Fredda Weinberg Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Mar-13-06 01:02 AM
Response to Reply #7
9. It's not Gödel, Escher, Bach
I bought it as a birthday present to myself when I first went back - I knew I'd be taking AI at some point and wanted a head start. It was very helpful and I'm sure contributed to the good grade I got. But it's pretty deep - it took me months to absorb the main ideas and I really didn't gestalt it until tonight.
Printer Friendly | Permalink |  | Top
 
Fredda Weinberg Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Mar-13-06 12:24 AM
Response to Reply #1
3. Yup and nope
The script fragments will be stored in SQL Server - turns out you need different OS calls for different versions. When I wrote the proposal, I assumed the first thing M&M would have to do is discover the OS version and sure enough ... even the command to reboot depends on it.

M&M will "remember" what it's done in previous incarnations by referencing text files it generates as it queries - not just the OS version, but some performance indicators that it can analyze later to see where it hits the wall, so to speak.
Printer Friendly | Permalink |  | Top
 
aePrime Donating Member (676 posts) Send PM | Profile | Ignore Mon Mar-13-06 12:48 AM
Response to Original message
5. Simulated annealing is too simple a method
If you're looking for minima or maxima, simulated annealing is too simple a process to be good for much. As you've pointed out, it has a tendency to get stuck in local extremes, both because it converges too quickly, and it doesn't get the opportunity to explore the search space.

A local beam search may be better suited, but I think a good genetic algorithm would be your best option. Genetic algorithms can also get stuck in local minima or maxima, but they fare much better than SA, and there are many methods in GAs that allow a population to not converge on one extreme, aside from the inherent mutation; fitness sharing and a distributed island model are simple and come readily to mind.
Printer Friendly | Permalink |  | Top
 
Fredda Weinberg Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Mar-13-06 12:59 AM
Response to Reply #5
8. My AI professor suggested GA, but
I'm already incorporating a Monte Carlo method to select the magnitude of the recordset and the incremental increases/decreases - what solution string would mutate or swap segments?

Distributed island model? Fitness sharing? Hmmm ... even if I don't implement these suggestions, they're worth mentioning. Thanx!
Printer Friendly | Permalink |  | Top
 
aePrime Donating Member (676 posts) Send PM | Profile | Ignore Mon Mar-13-06 02:23 AM
Response to Reply #8
11. It helps if you have a beowulf.
If I read your original post correctly, it seems that the SQL queries aren't changing, just the size of the data set. If that's the case, I have a hard time conceptualizing the GA and what the operations (mutation and crossover) would be. I'm used to using GAs that specify a solution, not a test. I'd have to think about that more.

The island model is actually only marginal for exploring the fitness landscape -- it's real benefit is that it can be parallelized very easily. The problem is that if you have a strong individual on one island, it will quickly dominate on another island when transplanted. Fitness sharing works well.

If you're working with a real-valued fitness solution landscape (but it doesn't sound like you are), particle swarm theory will work even better than a GA. I've even seen combinations of the two.

Restricted mating can also help with diversity.

Your monte carlo method should give you a nice overview of the solution landscape, if it's simple enough. Perhaps there's just a coefficient at which point the two methods cross in their time to execute.
Printer Friendly | Permalink |  | Top
 
Fredda Weinberg Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Mar-13-06 02:42 AM
Response to Reply #11
13. Excellent and many thanks
This has been just the exchange I was hoping for.
Printer Friendly | Permalink |  | Top
 
Benfea Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Mar-13-06 12:49 AM
Response to Original message
6. I don't get it.
If the dataset is small, the cursor works better and you use it. Otherwise, don't. Granted, I don't know what a cursor is, but it sounds to me like something no more profound than writing a script to check what kind of browser is loading your page, and adjusting the HTML accordingly.

Maybe I'm just missing something here. It sounds perfectly sensible, but not terribly profound.
Printer Friendly | Permalink |  | Top
 
Fredda Weinberg Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Mar-13-06 01:19 AM
Response to Reply #6
10. Your intuition informs your understanding but prove it
My experience surprised me - it didn't matter whether the script had cursors or not ... when the client upgraded from 1 GB to 2 the data warehouse import routine went from 30 hours to 30 minutes. I find that kind of non-linear response fascinating and I wanted to explore the boundaries, but writing the routine that would be capable of terminating itself and, as one reviewer noticed, "keeping good notes" seemed more suitable to a machine than my limited patience.

It's the self-reference that keeps popping up that I find fascinating. Turns out Microsoft's development team kept their performance measuring scripts in database format - and the OS version is one of the indexed fields. They also offer a stress test application whose first task is the production of an entire set of text files, suitable for sending back to the development team for analysis. It does this by dynamically producing a SQL script that outputs the environmental and performance values.

Think about it for a moment ... all the effort that's gone into making machines act like humans, when a more constructive use of our time would be to "teach" them to improve themselves. Yes, this is a trivial application of the concept, but variations for network traffic direction and data mining are practical.
Printer Friendly | Permalink |  | Top
 
Benfea Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Mar-13-06 02:34 AM
Response to Reply #10
12. Uh...
My experience surprised me - it didn't matter whether the script had cursors or not ... when the client upgraded from 1 GB to 2 the data warehouse import routine went from 30 hours to 30 minutes. I find that kind of non-linear response fascinating…

Are you certain this isn't just a performance issue? I've never studied programming formally (certainly not AI), but it sounds to me like the client could just be hitting the page file a lot at 1GB (I assume you're talking about system RAM there, right?*) and not at 2GB. I have no clue what platform you're on, but is the cache dynamic and is it expanding while your import routine is running? That kind of thing doesn't fascinate me, it makes me angry until I find out why something like that is happening (which probably explains why I studied physics instead of comp. sci.).

* if you're talking about the size of the data, then your results are extremely counterintuitive, and I would need to know more about what it is you're doing and any discussion about database calls makes my eyes glaze over.

…and I wanted to explore the boundaries, but writing the routine that would be capable of terminating itself and, as one reviewer noticed, "keeping good notes" seemed more suitable to a machine than my limited patience.

What's the big deal about self-terminating routines? All routines terminate themselves eventually. ;) I have a feeling I'd be more impressed if I understood what it is you're trying to do with the data, but please see the above footnote about databases. My database experience is limited to smallish corporate environments where things are much simpler.

It's the self-reference that keeps popping up that I find fascinating. Turns out Microsoft's development team kept their performance measuring scripts in database format - and the OS version is one of the indexed fields. They also offer a stress test application whose first task is the production of an entire set of text files, suitable for sending back to the development team for analysis. It does this by dynamically producing a SQL script that outputs the environmental and performance values.

About the only "self-reference" I've dealt with are recursive function calls (fun stuff… assuming you did it on purpose ;) ), which sounds only vaguely similar to what that constructed SQL script is doing.

Think about it for a moment ... all the effort that's gone into making machines act like humans, when a more constructive use of our time would be to "teach" them to improve themselves. Yes, this is a trivial application of the concept, but variations for network traffic direction and data mining are practical.

Now you're just getting philosophical. Getting a chunk of software to act more like humans is irrelevant (or if you'd like to compare it to the Grand Unified Field Theory, pointless unless you accept that wandering down the path is what's important even if the destination is unreachable/meaningless), making a chunk of software perform a new task isn't.
Printer Friendly | Permalink |  | Top
 
Mr Rabble Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Mar-13-06 02:50 AM
Response to Reply #12
14. Cache is dynamic with MS.
Edited on Mon Mar-13-06 02:51 AM by emperor72
At least in standard configuration. All servers and desktops will show a dymanic page/swap. The 1GB dataset should be much easier to handle. That is odd indeed.

I understood the script termination to include either shutting down the service and/or reboot.

Pick another thesis./2 cents/
Printer Friendly | Permalink |  | Top
 
Fredda Weinberg Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Mar-13-06 03:16 AM
Response to Reply #14
16. You've never seen thrashing?
Once you page out of RAM, fuggetaboutit - and it's a simple fact of life that we develop on sample sizes ... the production people didn't even have an extract for me to work on until my scripts were completed. You'd think three million records wouldn't be a big deal ... but when you're populating a data warehouse there's lot of number crunching for each dimension and level in the hierarchy.

Another thesis topic? No way! It's been an extremely productive use of my time and I'm going to get this bad boy submitted way ahead of the deadline. Tonight's discourse has shown me that the topic can be explained to non-professionals, which was my greatest concern.

The hardest part is writing everything without resorting to personal pronouns - or humor. Tonight's exercise has been a healthy venting after an day of dry academic prose. Thanks for playing!
Printer Friendly | Permalink |  | Top
 
Benfea Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Mar-13-06 09:55 AM
Response to Reply #16
18. If it's a performance issue, it's not counterintuitive at all.
The access times for hard drives are orders of magnitude larger than for RAM, so if it's a performance issue related to available RAM and virtual memory, then it makes perfect sense.
Printer Friendly | Permalink |  | Top
 
Fredda Weinberg Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Mar-13-06 01:25 PM
Response to Reply #18
19. The literature suggests that non-cursors would be faster
But for all dataset sizes the times were the same - that's why my paper's title is, "Are cursors obsolete?"

Yes, it's a performance issue, but predicting the crash point is currently a black art - my intuition told me to what to recommend to my client, based on the results I saw with any of the recommended techniques.

Having an automated system should offer a clearer picture of what's going on - and if the answer is "no" then I'll have something worth publishing.
Printer Friendly | Permalink |  | Top
 
TheBaldyMan Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Mar-13-06 03:28 PM
Response to Reply #19
21. I gave this some more thought through the day ...
it isn't unusual to have an algorithm that is far more efficient on huge datasets, yet it gets trounced by another less sophisticated algorithm when the dataset size is much smaller. I can think of several sorting algorithms that display this disparity.

You are examining RD queries so the question I'd like to ask is this. Are the algorithms from different eras, and the later one has been tailored for huge datasets that are of interest to those developing datamining techniques on truly massive datasets. I'm thinking of webservices ans intelligence specifically.
Printer Friendly | Permalink |  | Top
 
Fredda Weinberg Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Mar-13-06 05:32 PM
Response to Reply #21
22. I'll report back when I have more objective numbers
But when I did the original crunching, all the methods were equal for both the 30k and 3 million record datasets.

To me, it is a puzzle - and I like solving mysteries.
Printer Friendly | Permalink |  | Top
 
Fredda Weinberg Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Mar-13-06 03:03 AM
Response to Reply #12
15. It's not just counterintuitive - it flies in the face
of everything I've seen published.

Yes, it's a performance issue but what struck me was the consensus in the literature that cursors are obsolete - and I was ticked off that after taking the time to recode the manipulations the data transformation package was still taking more than a day to conclude. The client wanted to analyze the previous day's intake (New York City's Department of Sanitation, no less) and I wasn't going to let them down. But it was an educated quess, not a quantifiable analysis - the vendor who sold the city the data warehousing package wanted to charge extra for that.

As for self-termination, I wasn't going to explore boundary conditions manually - and SQL queries don't normally quit because they've run too long. But as it turns out, Microsoft has incorporated this capability into their stress tester, so all I have to do is invoke it from my shell script. Of course, I didn't know this puppy existed until I started researching the topic.

You could argue with Alan Turing whether getting computers to mimic humans is relevant or not ... but after studying the subject, I agree with you.
Printer Friendly | Permalink |  | Top
 
wildhorses Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Mar-13-06 03:55 AM
Response to Original message
17. simple, yet profound
paradoxical...:shrug:
Printer Friendly | Permalink |  | Top
 
entanglement Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Mar-13-06 01:41 PM
Response to Original message
20. Not a computer scientist, but hasn't funding for AI research
almost dried up at the university level because it promised too much and delivered too little?
Printer Friendly | Permalink |  | Top
 
DU AdBot (1000+ posts) Click to send private message to this author Click to view 
this author's profile Click to add 
this author to your buddy list Click to add 
this author to your Ignore list Mon May 13th 2024, 03:25 AM
Response to Original message
Advertisements [?]
 Top

Home » Discuss » The DU Lounge Donate to DU

Powered by DCForum+ Version 1.1 Copyright 1997-2002 DCScripts.com
Software has been extensively modified by the DU administrators


Important Notices: By participating on this discussion board, visitors agree to abide by the rules outlined on our Rules page. Messages posted on the Democratic Underground Discussion Forums are the opinions of the individuals who post them, and do not necessarily represent the opinions of Democratic Underground, LLC.

Home  |  Discussion Forums  |  Journals |  Store  |  Donate

About DU  |  Contact Us  |  Privacy Policy

Got a message for Democratic Underground? Click here to send us a message.

© 2001 - 2011 Democratic Underground, LLC