Anyone use machine learning?

ptpatil

New member
Jun 23, 2010
445
11
0
Recently I went through the standford machine learning lectures that they make available online and was playing around with some test data sets and useful algorithms I could use.

Being a aff marketer in the past (not anymore, but I come back here from time to time), I was wondering if any of you guys have attempted to use machine learning to suss out useful factors to improve profitability in any kind of campaign.

The most basic form I can think of is to collect as much data attributes about your visitors, their visit path, the particular media/method you got them to visit with and its attributes. If you have a large enough dataset you could use log-likelihood to determine what factors are related non-coincidentally, cluster your visitors according to profitability and attributes mentioned before etc.

I was wondering if anyone else has used their data in this way and what, if any, insights on ML methods you found particularly useful.
 


Also, if anyone has a aff marketing campaign data set they can provide me that would be awesome. Obviously, you want to protect your campaigns, but I can use old data for campaigns that you guys no longer run. I just want stuff to experiment with. If you send me data I'll send back what insights/relations I found.

I'd probably say a Ganalytics export along with PPC data would be ideal.
 
Doesn't neural networking come in Machine learning? It can be used for image recognition (break CAPTCHA's ) and hence more efficient spamming. ::emp:: has recently posted good neural networking video tutorials.
 
Doesn't neural networking come in Machine learning? It can be used for image recognition (break CAPTCHA's ) and hence more efficient spamming. ::emp:: has recently posted good neural networking video tutorials.


Yea NNs are good for black box learning and is basically iteratively changing a linear or non-linear function to take your predefined inputs and match them to outputs observed in a training data set (so for captchas, you'd have to have a database of tons of solved captchas labeled along with the image).

This is not what I am talking about however, this is a narrow application to a certain obstacle in marketing. I am talking about modeling a wide range of data (data warehousing essentially) and using clustering/classification algorithms to see hidden important attributes.

Here is an example of what I am talking about, you take the following data:

Visitor geographic location
Visit time
Words in title and description of PPC ad visitor originated from (tokenized and indexed)
Visit landing page
Cost of Visit click
Visit 24 hr unique? (binary attribute)
Exit page/event
Bounce? (binary)
Visited pages on site if not bounce
Visit action leading to revenue (i.e. click on order page?)
Visit revenue (if they bought, signed up etc.)
Visit profit (revenue - cost)

You can use all of these attributes use say a k-Nearest Neighbor clustering algorithm to automatically determine predictive clusters of attributes that lead to higher visit profit (or any other target attribute such as visit action leading to revenue).

For example, you could discover a cluster that indicated people from the Northeastern United States who visited from 9:00 Pm to 12:00 Pm via word attributes "Obama t-shirts" in the PPC ad yielded higher than average values in the "visit profit" attribute.
 
for the example above, a J48 decision tree would also be useful since it partitions data based on information entropy.
 
You may be surprised that something like a C4.5 decision tree classifier with 10-fold cross validation will give better results than neural nets. kNN or k-Means clustering may give good results for that data, but you may need to work with the data some to know more about which attributes out of the list you have are the most important. Machine learning algorithms will tell you that.

This has implementations of a lot of the common algorithms, and you can also add your own:
http://www.cs.waikato.ac.nz/ml/weka/downloading.html (C4.5 in weka is called J48)
 
You may be surprised that something like a C4.5 decision tree classifier with 10-fold cross validation will give better results than neural nets. kNN or k-Means clustering may give good results for that data, but you may need to work with the data some to know more about which attributes out of the list you have are the most important. Machine learning algorithms will tell you that.

This has implementations of a lot of the common algorithms, and you can also add your own:
http://www.cs.waikato.ac.nz/ml/weka/downloading.html (C4.5 in weka is called J48)

awesome! I see we have other ML enthusiasts at WF. I wouldn't be surprised at all if a c4.5 gave better results than a NN for the above example especially since some of the features are binary/non-continous.

kNN would also have some problems with the binary attributes (would make the k in that dimension trivial/meaningless as a distance measure).

From my recent messing around with Mahout and scikit I find log-likelihood is excellent for feature selection when used the right way. I also find that classifier chains used in stuff like Mulan (multi label extension of Weka) is also very useful for the data I crunch.
 
Doesn't neural networking come in Machine learning? It can be used for image recognition (break CAPTCHA's ) and hence more efficient spamming. ::emp:: has recently posted good neural networking video tutorials.

Did he post his thread? Not sure how I missed it. Do you have a link?


edit: hurr durrr found it
 
anyways, if anyone would like to give me some kind of ppc/analytics data just PM me, I'm sure we can work out an agreement so you get something out of it too.
 
anyways, if anyone would like to give me some kind of ppc/analytics data just PM me, I'm sure we can work out an agreement so you get something out of it too.

Any success, ptpatil? I have a background in machine learning as well, and I'm pretty curious how it went.
 
This is what the display network that I work at does. Examines the browsing habits of people that visit a brand website and then uses machine learning to break down their habits and then looks at everyone else that surfs the web and scores those people in relation to different brands customers based on browsing habits and pages visited on sites. Each person is then placed in a scored and ranked bucket based on likelyhood of converting on a given clients ads and different segments are made using different algorithms for scoring. Each different segment and rank then gets a different bid price for their impression. Rank A would be the smallest in volume of possible impressions but the most likely to convert and then as you go down the alphabet each letter becomes a bigger pool but less likely to convert than the previous letter. Every person in a bucket gets reanalyzed and rescored every couple of days.
 
Please tell us more.

Nothing significant, mostly in the area of fraud detection, so nothing directly related to AM.

I wonder for some time, however, if machine learning or broad AI techniques could help here. There are clearly some inefficiencies around but it's equally possible that the scale most affiliate marketers operate on do not make it worth the effort.

For example, to put aside the visitors, the campaigns are still managed by gut feeling. Nothing wrong with this but there's a room for improvement by betting on keywords or landing pages with a bot employing sound money management and stats when making decisions.

That's just an idea, and I have no enough experience to tell how viable it is.
 
Nothing significant, mostly in the area of fraud detection, so nothing directly related to AM.

I wonder for some time, however, if machine learning or broad AI techniques could help here. There are clearly some inefficiencies around but it's equally possible that the scale most affiliate marketers operate on do not make it worth the effort.

For example, to put aside the visitors, the campaigns are still managed by gut feeling. Nothing wrong with this but there's a room for improvement by betting on keywords or landing pages with a bot employing sound money management and stats when making decisions.

That's just an idea, and I have no enough experience to tell how viable it is.


Short answer - if you can define the problem in such a way that you can feed a set of data into a function, and it can non-trivially return some predefined classification based on that data, and it can be done in a reasonable amount of time, then you can apply machine learning techniques.

Long answer:
http://www.mpi-inf.mpg.de/~mehlhorn/SeminarEvolvability/ValiantLearnable.pdf
 
You may be surprised that something like a C4.5 decision tree classifier with 10-fold cross validation will give better results than neural nets. kNN or k-Means clustering may give good results for that data, but you may need to work with the data some to know more about which attributes out of the list you have are the most important. Machine learning algorithms will tell you that.

This has implementations of a lot of the common algorithms, and you can also add your own:
http://www.cs.waikato.ac.nz/ml/weka/downloading.html (C4.5 in weka is called J48)

ahZTxPI.jpg
 
cardine is a member here that does that. He posted up the stanford class when it first started.
 
I'm also very much into ML but more as a hobby as it's what my formal education was in. Are people liking the standford courses? Follow up reviews?
 
I'm also very much into ML but more as a hobby as it's what my formal education was in. Are people liking the standford courses? Follow up reviews?

Shit, all this time I thought your formal education was a focus in kicking major ass online.